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'■ - ^ ' 'Pabt T.^-^-Thioeeticaii, - - ■ ^ , . . 

Asymmetrical Frequency Curves. 

(1.) A K asymmetrical frequency curve may arise from two quite distinct classes of 
causes. In the first place the material measured may be heterogeneous and may 
consist of a .mixture of two or more homogeneous materials. Such frequency curves, 
for example, arise when we have a mixed population of two different races, a homo- 
geneous population with a sprinkling of diseased or deformed members, a curve for 
the frequency of matrimony covering more than one class of the population, or in 
economics a frequency of interest curve for securities of different types of stabihty — 
railways and government stocks mixed with mining and financial companies. The 
treatment of this class of frequency curves requires us to break up the original curve 
into component parts, or simple frequency curvea This branch of the subject (for 
the special case of the compound being the sum of two normal curves) has been 
treated in a paper presented to the loyal Society by the author, on October 18, 1893. 
The second class of frequency curves arises in the case of homogeneous material 
when the tendency to deviation on one side of the mean is unequal to the tendency 
to deviation on the other side. Such curves arise in many physical, economic and 
biological investigations, for example, in frequency curves for the height of the 
barometer, in those for prices and for rates of interest of securities of the same 
class, in mortality ciu^ves, especially the percentage of deaths to cases in all kinds of 
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fevers, in income tax and house duty returns, and in various types of anthropological 
measurements. It is this class of curves, which are dealt with in the present paper. 
The general type of this class of frequency curve will be found to vary (see Plate 7, 
fig. 1) through all phases from the form close to the negative exponential curve : 

to a form close to the normal frequency curve 

where C and 'p are constants. 

Hence any theory which is to cover the whole series of these curves must give a 
curve capable of varying from one to another of these types, i,e,, from a type in 
which the maximum'''^ practically coincides with the extreme ordinate, to a type in 
which it coincides with the central ordinate as in the normal frequency curve. 

It is well known that the points given by the point-binomial (| + \y coincide very 
closely with the contour of a normal frequency curve when n is only moderately 
large. For example, the 21 points of (^ + ^^^ li© most closely on a normal frequency 
curve, and the author has devised a probability machine, which by continually bisecting 
streams of sand or rape seed for 20 successive falls gives a good normal frequency 
curve by the heights of the resulting 21 columns. Set to any other ratio jp : g' of 
division other than bisection^ the machine gives the binomial (j9 + g)^^, or indeed any 
less power and thus a wide range of asymmetrical point-binomials. Plate 7, fig. 2, 
represents, diagramatically, a 14-power binomial machine. 

Just as the normal frequency curve may be obtained by running a continuous 
curve through the point-binomial (^ -f- ^)^ when n is fairly large, so a more general 
form of the probability curve may be obtained by running a continuous curve through 
the general binomial (p -f- qy\ As the great and only true test of the normal curve 
is : Does it really fit observations and measurements of a symmetrical kind ? so the 
best argument for the generalised probability curve deduced in this paper is that it 
does fit, and fit surprisingly accurately observations of an asymmetrical character. 
Indeed, there are very few results which have been represented by the normal curve 
which do not better fit the generalised probability curve, — a slight degree of 
asymmetry being probably characteristic of nearly all groups of measurements. 
Before deducing the generalised probability curve, it may be well to show how any 
asymmetrical curve may be fitted with its closest point- binomial. This will be the 
topic of the following five articles. 

(2.) Consider a series of rectangles on equal base c and whose heights are respec- 
tively the successive terms of the binomial (p+ Q')^^ X a/c, where ^p -^-qz^zX, Here a is 
clearly the area of the entire system. Choose as origin a point distant \g from the 

* I have found ifc convenient to use tlie term mode for the abscissa corresponding to the ordinate of 
maximum frequency. Thus the ''mean," the ''mode," and the "median " have all distinct characters 
important to the statistician. 

MDCCOXOV.— A. 2 Y 
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boundary of the first rectangle^ on the line of common bases, and let yr be the height 
of the r*^ rectangle^ or 
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Let us find the values of 






t{yrG X {rcY} 



where s is an? integer, for values of s from to 4. 
It is easy to see that 

^ r / s ^ (If d \ / d ^ 



dq y^- dq j \^ dq 



O Q 6 



q {p + qy\ 



where the operation d/dq is repeated ^ times. 

The operations indicated can easily be performed by putting q = &' when 

t {yro X {roy} - J (0[e- (p + e-y}, 

and the successive values can be found by Leibmtz's theorem. After differentiation 

we may put j^ + g or |) + ^'^ =^ 1^ There results : 



S {yrc) 
t {yrC X re) 
t {yrC X {rcf) 
t (yrO X (ro)^) 
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ao'^ {1 + 7nq + 6^?. (r?. — 1) q^ + r^ (n — 1) (n — 2) 5^} 

ac4{l + 157ig + 25n {n -- 1) q^ + lOn {n -- 1) {n — 2) g^ 

+ n {n — 1) (^i — 2) (^ 



3) ?'}« 



Let NG be the vertical through the centroid of the system of rectangles, then 
clearly 

ON = 2 {yrC X re) I a =. c {1 + "^q}^ 
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We shall now proceed to find the first four moments of the system of rectangles 
round GN. If the inertia of each rectangle might he considered as concentrated along 
its mid vertical, we should have for the s*^^ moment round NG, writmg cZ = c (1 + ^^S')^ 

cL^i^ =: t {yrO X {rc — ciy ] . 

The resulting values are 

/x^ = npqc^ 

l^^ = npq{p^q)c^ 
H^^k = ^Wq. {1 + 3 (n ^ 2)fq} c\ 
whence, remembering thatp 4-^=1, we find that jd and q are roots of 









2/^/ ^{2 (Sfi^^ — fj.^) fj.^ + 3/^./} 



^ /O 9 \ , o 3 ^-^ 



Thus, when /Xg, /Xg, and /x^ have been calculated for the frequency curve, the 
elements of the point-binomial are known. These results were given by me in a 
letter to 'Nature,' October 26, 1893. 

They give quite a fair solution so long as n is large and c small, ^.e., so long as the 
asymmetry and the "excess '' (' Phil. Trans.,' vol. 185, A, p. 93), measured respec- 
tively by /X3 and /x^^ — 3jui/ (which vanish for the normal curve) are not considerable.'"' 
In many cases, however, they are considerable, and the following solution is perfectly 
general. 

^ If 2/0 denote the largest term in (p + gY and yt the ?^th term beyond it, then an application of 
Stirling's theorem — ^if nhe large— shows that 

Take 

log ti = (t -pn - 7|) log (1 ~ ~ j 

■ " ft 

log v= (~~~f-- qn - I) log (I + -^ 

and expand the right hand side in powers of t, we find 



log u-v^i -f 2pnJ 2pn I ^ 2pn / 6pV T ~ pn J ~~ \2pV \ ^ ~" 2pn) "~ ^^^' 
Hence, remembering that j? + g = 1^ we have 

log- uv = - 1^£^=~^^ ^'Jl _ l~^Pl] + ^' ^-P~ll (i __ ^.~J>'l] 

^ 2pqn ^'^JP9, \ ''^'^W^ I 6p^qhv' \ npq I 

t^ ( 3(l-4pg + 2/g3)\ 

- IpgS^l y- - ^m - - ^2npq / + ''*^- 

Now, making use of the values given in § 2 for /cg, /tg, and jn^, and w^riting /; x == x, and yt = y., 
we find 



348 • MR. K. PEARSON OK THE MATHEMATICAL THEORY OP EVOLUTIOIST. 

(3.) To find the nth moment of a trapezium ABCD about a line parallel to its 
parallel sides^ 3/1 and y^ being the lengths of the parallel sides^ x^, x^, their distances 
from the moment- axis, and Xcy — x^ •=^ c. 
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Let M;^ be the 7ith moment. Then 
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(4.) Now consider a curve of observations made up of a series of trapezia on equal 
baseSj as in the accompanying figure : 



a"^ 



y = y^o- 2f.^ 



,-^;a~i3i-|(3-^,)) ^ ^_^ ^ ^g'^d-i^i-lis-^,)) ^ ^™y|^^(i^i-l(3-^2)"f(3-^3P-v^i^-i(3-p2)^]) x etc. 



X e- 2fx/ X e 6m,2--^ 



where /S^ ™ /tg^/^t/ and ^3 = /%^/*3^. 

This appears to be the more general form of a result given, by Professor Edgiworth, 'Roy. Soc. 
Proc.,' vol. 56, p. 271. 

For the normal curve ja.^ = 0, /li,^ = SyM-^^ ; hence, if p does not differ much from g, ft^ and /B^ — 3 will 
be small, and we may neglect their products with x/ \//i^. Thus approximately 



Tliis agrees with Professor Edgeworth's special case if we expand the second exponential. His 
'- negative frequency " is accounted for by the fact that he has only taken the first terms of a long 



series, %,e., 



A% 



y^y^e-^/^l^^^l-^^ 



th 






I have not considered this form of the skew-curve at lengthy because it is only a ^rs^ approximation to 
the more general forms considered in this paper, and further, because it is only applicable in practice 
within extremely narrow limits. 
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Here y^, y^, Vh^ - - - Vr ^^® ^"^^ frequencies of deviations falling within the ranges 
a^i ± \g, x,2'±.\c, ^3 ± ic . . . a^r ± ic . . ., and the tops of the ordinates are joined 
to form a frequency-curve in the usual manner. 

Let M!n be the tith moment of the system of timpezia about the Jine Oy, then 



M.;,= S^2y^ 



'' "■,.«c . n(vj — 1) 






14 ' *'• 



»-2.S J_ .^ 0^-1) (^^ - 2) (->^ - 3) ^ „_4.g 

^ 16 






In particular, if we take Oy in the position 0'/ at distance c from y^, we have 
Xr = rc, and accordinglyj 



M' 



n 



^.» + 1 



N'„ + 



n (n —■ 1 ) --., ?i (7^ — 1) (^^ — 2) (71. 



where N'^ = S {yr"^). 
In particular. 






12 ^' ''^' ' 360 

n {n — 1) {n — 2) (7^ — 3) (n — 4) (7^ 



20160 



LYJ Q — — CxN Qj 



3) ^T' 



N' 



5) XT' 



5t— 4 



N'„_6 + etc. ) 



M'a = c^ (N'3 + iN'i), , 

M', = c« (N's + IN'3 + iN'i). 
When we put M'j/M'q = /u.'„ and Ws/Wq = i'',, these reduce to 



/ 

/ 

/X3 — 

/ 

/^4 — 

/ 

/X5 — 



C^ (i^'4 + Z/'g + iVo)' 



Now let /X;^ be the value of the nth moment of the trapezia system about the 
vertical through its centroid divided by its area. 



350 MR. K. PEAESOE ON" THE MATHEMATICAL THEORY OF EYOLUTIOlSr. 
We have : 



/ / 



n (n—1) , ^ ; n (n - 1) (n --_2) ^,^ , 



l^n=^ l^ n — np. ,^ jJL ,,_i + ^^'TV'~''' ^ 1 /^ n-2 ^^ i .7 9 /^ 1 /^ n-3 + ^tc. 



Thus we find : 
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c^ (.', - 
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- Av\ V, + 6.7 .', ^- 3.'i* + {^', ™ ^7 + -1-}), 
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Comparing these results with those given in the ' PhiL Trans./ voh 185, p. 79, 
Eq. (4), we see that treating the curve as built-up of trapezia instead of loaded 
ordinates introduces the parts into the values of the ^'s enclosed in curled brackets. 
These additions are small, but in many cases quite sensible. Since the series of 
trapezia gives in general a closer approach than the series of loaded ordinates to the 
frequency curve, and, further, since the calculation of these additional terms is not 
very laborious^ it v/ill be better for the future to calculate the moments of any 
frequency curve from the above modified formulas. 

(5.) Returning now to the point-binomiab we have : 

]^'^ — 1 + qiq^ 

v\ =14" '^nq_ + n (n — 1) (f^ 

p^ z::z I -j~ 7nq + 6^^ {71 — 1) q^ + n {n — l) {n — 2) q^^ 

v\^ -:rr, 1 »f 1577.^ + 257^ {u — l) f/^ + lOu (u — l) (n — 2)q^ 

^„ fi (^riq^ ™- 1^ {n — 2) (ri — 3) q'^. 
Thus : 

/^3 == c^ {n^<i + i). 

' IH - <^* {-Tb + n^q (2 + 3 {n - 2)pq)). 

If^ instead of taking trapezia^ ive had taken a series of rectangles, but not, as in § 2, 
concentrated their areas along their axes, we should have found the following 
system : 

^H =^ ^^ (^'^? + A). 
^3 := ^ c^ npq {q - p), 

N. ^ <^'^ (s-o + ^W^l (I + 3 {n - 2) pg)). 
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Hence if we write : 

/^2 ^ ^^ C^M + ^i)^^^ 

/^4 ^ <^^ (% + n^9. {h + 3 (n — 2) ]3(i)), 
we have : 



For trapezia : e^ == ^-, e^, = xV, €3 '^ 

For rectangles : e^ = -^fg-, €0 ^^ g-o? ^3 ^=^ 1*5, 

For loaded ordinates : e-y :=:= 0, Cg, = 0, €3 = 1, 

and the above general system may be applied to all cases. 



Writing 



z -=: npq^ ^i ^ '-^^ 5 and /S^ — -^3 , 



we have by elimination the cubic for z : 

z^ (6 + 3/3i - 2,8,) + z^ (2€, -3 + 9l3,e, - A/3,€,) 

+ z (26, + 9P,€,^ - 2,8,€i^) + 3^i€i3 == 0. 

The remaining constants of the binomial are : 
and 



C :== 




^2 _,. + 



^ + e^ 



(6.) Let us illustrate these results by a numerical example. Plate 8 gives 
Dr. Venn's curve for 4857 barometric heights. Along the horizontal, 1 cm. equals /l'' 
of height of barometer, and the scale of frequency is 1 sq, cm. =^ 28*304 observations. 
The centroid vertical and the second, third, and fourth moments about it were found 
for me:|; by the graphical process described, ' Phil Trans.,' vol 185, p. 79. We have 
the following results : — - 

=^ This result seems of considerable importance, and I do not believe it has yet been noticed. It gives 
the mean square error for any binomial distribution, and we see that for most practical purposes it is 
identical with the value \^n/pq, hitherto deduced as an approximate result, by assuming the binomial to 
be approximately a normal curve. 

t If we take z + e-^ = x ^^^^ fundamental cubic reduces to 

(6 + 3ft - 2ft) x' - (2 ™ i/3o) x' + A% - ^"5 - 0, 
a form in which the coefficients are easily calculated and the nature of the roots discriminated. 

$ By Mr. G. U. Yule, who has given me very great assistance in the laborious calculations required 
in the reduction of frequency curves. We have used, with much economy of time, the "Brunsviga" 
calculator. 
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«= 171-6, 
Ujo ^^ to y 



^2 =: 10'14, 

fi^^ = 326-34, 



all in centimetre units. 
These give 

/3i = -24401, 183 = 3-L739. 
Hence for trapezia, 

-3842?;3 _ 7499172^ + -018008^ + -003389 

and for rectangles, 

•384208 - -874962^ -■ -0038320 + -000424 



0, 



= 0. 



These give the following solutions : — 



z 


Trapezia. 


Rectangles. 

2-28034 


Lines. 


1-92516 


2-6028 


n 


19-379 


23-983 


28-5293 


P 


•8881 


•8936 


•89985 


9. 


•1L19 


•1064 


•10015 


G 


2-2017 


2-0712 


1-974 


ocJG 


77-94 


82-85 


86-93 


d 


6-976 


7-3562 


7-614 



•983 



k28-5293 



Here c? == c (1 + nq) gives the distance of the start of the point-binomial from the 
centroid vertical. The three point-binomials are therefore 

77-94 (-8881 + ^1119y'''^^, 

82-85 (-8936 + •1064)2^'^ 

86-93 (-89985 + '10015)' 
respectively. 

These three point-binomials are represented in Plate 8, fig. 3. It will be noticed 
that they all lie very close to the barometric curve ; they would be still closer if that 
curve were a real curve and not a polygonal line. The total areas between binomial- 
polygons and observation curves, treating all parts as positive, are for the three cases, 
10*3, 10*5, ll'O sq. centims. respectively, or taking the base range to be 23 centims», we 
have mean deviations from the observation curve of '448, ^457, '478 in the three cases 
respectively. Thus the method of trapezia gives slightly the best result ; the method 
of concentrating along ordinates the worst result. The total area of the curve being 
171*6, we have from another standpoint, mean percentage errors^ in the ordinates 
of about 6*03, 6'06, and 6*3, respectively. The generalised probability curve, if fitted 
to the same observations, gives an areal deviation of 7 sq. centims., or a percentage 
error of about 4. Thus it is very nearly one-third as close again as the point-binomials. 

^ The " percentage error " in ordinate is, of course, only a rough test of the goodness of fit, but I have 
used it in default of a better. 
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As typical samples of mean percentage errors considered by various statisticians to 
give good results, I may note the following, the frequency being about 1,000 or 
upwards : — AiHY, 9 ; Merriman, 13*5 ; Galton (Anthropometric), 7 to 15 ; Weldon 
(Crabs), 6*7, (Shrimps), 8*8; Stieda (Skulls), 7*6; Porter (School Girls), 7*7; Perozzo 
(Eecruits), 6*8 ; Bradley's observations, 5'85 ; PEARSOisr (Lottery), iS'7, (Tossing), 6*6. 

It is therefore clear that our point-binomials and generalized curve may be con- 
sidered to give good results."^ It will be noticed, however, that a little difference in 
the method of calculating the point-binomials leads, without much alteration of the 
percentage error, to a considerable change in their centroid-positions and the magnitude 
of their constants.t Generally speaking we may conclude that in round numbers the 
barometric frequency corresponds to the binomial ('9 + *1)^^ or to the distribution of 
zeros when 20 ten-sided teetotums, marked 0, 1 ... 9, are spun together. There is 
an apparent upper limit to the height of the barometer, and its deviation below the 
mean can be much greater than its deviation above. At the same time within the 
narrower range round the mean, the frequency of a high barometer is greater than 
the frequency of a low barometer ; the odds against a ^^ contributory cause " tending 
to a low barometer being about 9 to 1. I propose to investigate a wider series of 
barometric observations, in order to test how far the conclusions which may be drawn 
from Dr. Venn's statistics are general.^ 

A rather interesting point may be considered at this stage. Is it always possible 
to fit a point-binomial to a series of observations with a chance frequency ? Can we 
better the normal curve by a point-binomial ? The answer is Yes, if the fundamental 
cubic in x (second footnote, p. 351), has a real positive root. Now for the normal curve 
2 (3)W'/ "— i^4) i^2 + 3/^3^, or 6 + 3^^ — 2^^ is zero. For the loaded ordinates c will 
only be real if this expression be positive. It may, however, take small negative 
values for the trapezia, in which case ^ itself wn'll be small and only within narrow 
limits give suitable values for n. 

Hence, for real values of n, p and q^ it is impossible to fit a point-binomial to a 
series of observations for which 6 + S^Sj — 2^^ has a large negative value. The normal 
curve, for which /x^ = S/x^^, is nearer to any such observations than a point-binomial. 

For example, by aid of the modified expressions given in this paper, p. 350, we have 



^ As anotlier manner of testing, compare the ten-points of the point-binomial for lines with obser- 
vations : — 

Theory 5'6 16-9 21-8 19 1:L'9 57 2'1 7 '2 '03 

Observation ... 6-7 15-8 22-1 18*8 12 5*8 2-3 1*1 -2 -00 

t A curve drawn through the 30 points of the three point-binomials would be very close to the obser- 
vations. As a matter of fact, the skew probability curve passes very near to all 30 points. 

J [Miss A. Lee has since calculated the constants of three yea^s of Eastbourne barometric observations 
for me. While n and c differ widely from the Cambridge values, she finds p = '89375, q = '10625, a 
striking and suggestive agreement.] 

MPCCCXCY. — A, 2 Z 



354 MR. K. PEARSON OjST THE MATHEMATICAL THEORY OF EVOLUTION. 

for the data given for Professor Weldon's Crab Measurements, No. 4, ' Phil. Trans./ 
A, v(»]. 185, p. 96. 

^3=7-6759, /X3= 3-4751, ja.^ = 184*3039. 
Hence, 

/3^ = ,ji^y,M^^= /0267022/ 

/3^z=i ixjii^^ = 3*12807. 

Thus 6 + 3^^ •— 2/3^ is positive, and accordingly no rational point-binomial is likely 
to fit as well as the normal curve. As a matter of fact the fundamental cubic is now 

•17603^^ + 1-045327;^^ + '0337732; - '0003709 := 0. 

The two negative roots of this equation give imaginary value for p and g. The 
small positive root gives p greater than unity and q negative, n is also negative. 
Although I can give no interpretation to these results, it seemed well to complete in 
the latter case the solution and test how near the resulting point-binomial fitted the 
curves. I found 

2: =-00866, p=: 1-19268, g = — -19268. 

7^ =. — -037685, c == 6*61662, d =:: 6-6645. 

These give for the binomial 

150-0983 (M9268 -- -19268)"" •^3^^s^ 

or, 

151-89 (1 - -161552)"'037685^ 

or, 

151-89 + -92532 + -07756 + &c. 

Thus the sensible part of the binomial to the scale of our figure is a triangle, I 
have drawn this binomial, see Plate 8, fig. 4. The reader will mark a fit very close 
on the whole to the observations. We have the following percentage mean errors of 
the ordinates :— 

Normal curve . . « . . . ^ . 6-7, 

Skew probability curve. .... 4-4, 

Binomial ......... 10-5. 

We may conclude, therefore, that even if our binomial constants have unintelligible 
values, yet our method will give, in many cases, a closely-fitting polygonal figure. 
This remark should be read in connection with Professor Edgewokth's somewhat 
divergent views''^ on fitting chance distributions with curves other than the normal 
error curve. It is possible in almost every case to find simple combinations of lines, 

* See ' Phil. Mag./ vol. 334, p. 24, et seq., 1887. 
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circles, or parabolas of various degrees which give results extremely close to any given 
set of observations. 

For example, taking the range of frequency to be sensibly tt times the standard 
deviation, we have the following close expression for the error function by harmonic 
analysis 

r T 2'7' )V 

y z=yo\ "399 + '482 cos — + -109 cos — + '009 cos — 

Here y^ is the maximum ordinate, x any deviation, and cr the standard deviation. 
A couple of wave curves"^'' will thus very frequently give us a close approximation to 
a set of statistical measurements, quite as close as statistical practice shows the error 
curve to be. 

The above expression further allows the normal curve to be constructed by aid of 
scale and compasses — geometrically^ or its ordinates calculated from a table of cosines. 

Another example of the fitting of a point-binomial will be found in Part 2, § 34, 
Pauper Percentages. 

(7.) Consider the point-binomial e X (i + i)^ where e is any constant, and 
suppose a polygon formed by plotting up the terms of the binomial at distance o 
from each other. 

Then, corresponding to x^ = re, we have 



— ^ ^0^ -- l)(n- 2) ,. . { 71 - r + 2) 



yr = e 



\2) 



and 



Vr + i — IJr C (^ + 2) — (^,^ +J^r^ _ ^ {x\. + x' ,. + i) 



II X J' — — Xf "*~" 'qO \/Yh "-J"" -^ /. 

Now (y^ + i — yi)lo is the slope of the polygon corresponding to the mean ordinate 
i (yr+ 1 + Vr), or, writingt cr^ = i x i (^^ + l) c^ 

slope of polygon 2 x mean abscissa 

mean ordinate 2a'^ 

^ It is often sufficient to take 

/ X 2x \ 

y = Vq [^ + i COS -- -I- i cos --j . 

t The divergence of this value of o-^ from the ordinary value f x | X ^ is to be noted. The two agree 
sensibly if n be great. [Drawing on a large scale, however, the point- binomial {\ + |)i^ and the two 
normal curves with standard deviations of 1*5811 andl*6533, 1 find that the latter has a mean percentage 
error of only 1*76 as compared with 5*1 of the former. Thus it would appear that the normal curve 

corresponding to v (n -f- 1) p^ fits the point-binomial closer than one with the standard deviation vnpq 
usually adopted.] 

2 z 2 
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Now compare this property of the polygon with that of the curve : 



y = y^e 



x^j2(x^ 



We have by differentiation : 

slope of curve 
ordmate 



2 abscissa 



Hence : this hinomial polygon and the normal curve of frequency have a very close 
relation to each other^ of a geometrical nature, ivhich is quite independent of the 
magnitude of n. In short their slopes are given by an identical relation. By a 
proper choice of or and ^q, we can get the normal curve to fit closely the point- 
binomialj owing to this slope property^ ivithout any assump^tion as to the indefinitely 
great value ofn. It is this geometrical property which is largely the justification for 
the manner in which statisticians apply^ and apply with success, the normal curve to 
cases in which n is undoubtedly small. No stress seems hitherto to have been laid 
upon the fact that the normal curve of errors besides being the limit of a symmetrical 
point-binomial has also this intimate geometrical relationship with it."^ 

(8.) Now let us deal with the skew point-binomial in precisely the same manner as 
we have dealt with the symmetrical binomial Taking its form to be e(p + ?)''? we 
have, if Xr = r X c and X == qfp : 

2 (n- r + 1) Xjr »- 1 2 (X^^ + 1) ■- X(X + 1)) 



y 



r + i 



Vr 



2 



0/r+i + y^) c c (n — r + 1) X/r +1 c(\ (n + 1) + r (1 — X)) 



Let us w^ite Ary =^ y^^i —- y^ A^ = c. 
Then X,. + ilo = r + i? and : 



JL / ty „, 1...,,, /y \ 



'^ Tlie following table sliows tlie closeueBS of frequency witMn a given range as determined bj i\n 
binomials :— 



Ransfe of 


Freqnencj per cent. 


E'ormal curve. 

24 
38 
52 
73 
87 
96 
100 


deviation. 


(1 + 1)^0. 


(1 + ly'K 


3 

7 
11 
15 
21 
33 


24 
37 
50 
71 
87 
96 
100 


23 
37 
52 
73 
87 
96 
100 



Here the distribution of 100 groups eacb of 100 events is seen to be practically the same wbether we 
take w = 10 or ^* = GO . 



MR. K, PEARSON ON" THE MATHEMATICAL THEORY OF EVOLUTION. 357 



X (?^ + 1) — (1 + X) 



'^' + -^ -_ JL 



A,,.7/ 2 \ G 



2 



^«= X 2/. + J « X fn + 1) + (1 - X) (^^ - i) ' 



or, if X', + . = X,, + J — c(| -\-q{n-]r 1)), 



JO^ (71 + 1) C^ + (_p - (?) -- X', + J 



7X ,. _L 1 



a + X',. 



3 



r-\- 



.p 2 , 2^^ (n -\- 1)g 
i{ y=z: ~ —- and a = -^^ — • 

The curve which has the same law of slope as this skew binomial is : 

(9.) This curve accordingly stands in the same relationship to the skew binomial 
as the normal curve to the symmetrical binomial/''' There are several points, however, 
to be considered with regard to it. In the first place it is usually assumed that n is 
indefinitely great and c indefinitely small, and then it is supposed that we may 
neglect (p — q) oX\.^r as compared with pq {n + 1) c^, and so we deduce the normal 
error curve whether jp be equal to q or not. But I contend that this is unjustifiable 
except for very small values of X\, + l. When the deviation X' is considerable and 
c vanishingly small, X' will be an indefinitely great multiple of c ; c must be in fact 
the unit in which X' is measured and unless p = q, the ordinary normal curve is only 
an approximation, even if n be large, near the maximum frequency. In the next 
place, when we speak of n being large, are we quite clear as to what we mean in the case 
of physical or biological frequency curves ? We speak of a multiplicity of small 
^' causes '' determining the actual dimensions of an organ, or the size of a physical 
error, or the height of the barometer. But it is less clear why this multiplicity 
should be identified with the infinite greatness of n. If we take Dr. Venn's 
frequency curve for barometric height, we see that the closest point-binomial is by no 
means consistent with either p = q, or with n being indefinitely great. Further, 
many statistical results in games of chance are given with great exactness by the 
normal curve, although we are then able to show that n is quite moderate. 

Now, it is true that the biological and physical statistics to which we are referring, 
give essentially continuous curves, but it does not seem to follow of necessity that n 
must be infinite ; while their frequent skewness sufficiently indicates that the neglect 

^ JS'ote again the deviation of the constant pq (n + 1) c^ from its nsiiallj adopted vsiiae ^qnc^. 
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of XV + A as compared with a is unjustifiable. Thus, the maximum of a fever 
mortaUty curve cannot be an infinite distance from birth, which limits the curve in 
one direction, nor an age-at-marriage curve have a maximum frequency infinitely 
distant from the age of paberty, nor a frequency of interest curve separate its 
maximum, between 3 or 4 per cent., by an infinite distance from per cent. It is 
clear, therefore, that if such frequency carves as those referred to are to be treated 
as chance distributions at all, it would be idle to compare them to the limit of a 
symmetrical binomial. We are really quite ignorant as to the nature of the contri- 
butory ^^ causes *' in biological, physical, or economic frequency curves. The continuity 
of such frequency curves may depend upon other features than the magnitude of n. 
If I toss twenty coins, a discrete series of 0, 1, 2, 3, , . . 20, heads is the only possible 
range of results. Each individual coin, here representing a " contributory cause '[ 
can only give head or tail, and so many whole coins must give head, so many tail. 
If I want to make any ratio of head to tail, I have to take an indefinitely great 
number of coins, for each ^' contributory cause" must give a unit to the total. But 
it may possibly be that continuity in biological or physical frequency curves may 
arise from a limited number of ^^contributory causes " with a power o? fractionizing 
the result. We cannot conceive on the tossing of 20 coins that 13*5 will give heads 
and 6 '5 will give tails, we are obliged to deal with 200 coins, 135 giving heads and 
65 tails. Yet the tw^o things are not identical. The former corresponds to a value 
intermediate between two ordinates of {^ + W^^ ^"^^ "the latter to a definite ordinate 
of (^ + ^)^^^. So long as we remain in ignorance of the nature and number of 
'^ contributory causes '^ in physics and biology, so long as we do find markedly skew 
distributions, it seems to me that we must seek more general results than flow 
from the assumption that ]} '=• q and n = oo . The form of curve given in § 8 above is 
suggested as a possible form for skew^ frequency curves. Its justification lies 
essentially, like that of the normal curve, in its capacity to express statistical 
observations. 

(10.) But it must be noted that the generalised probability curve in § 8, although 
it contains the normal curve as a special case, is not sufficiently general. It is 
limited in one direction, indefinitely extended in the other. This limitation at one 
end only, corresponds theoretically to many cases in economics, physics, and biology. 
But there are a great variety of cases in which there is theoretical limitation at both 




Ai-r *raj 



ends ; that is to say, there is a limited range of possible deviations. For example, 
let a trapezium, ABCD, of white paper be pasted on a cylinder of black surface with 
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ef^ the axis of symmetry parallel to the axis of the cylinder. Then, if the cylinder 
be rotated, we shall have a series of grey tints from a darkish e to a lighter f. 
Now, if we ask several hundred persons to select a tint which would result from 
mixing the tints at e and/", we shall obtain a continuous frequency curve, falling, 
however, entirely within the range e to f. Or, again suppose a frequency curve 
obtained by plotting up the frequency of a given ratio of leg-length to total body- 
lengthy or of carapace to body-length. Here the range must lie between and 1. 
It is not that other values are excessively improbable, they are by the conditions 
of the problem absolutely impossible. Hence, it is clear that the curves obtained 
by Professor Weldon and Mr. H. TnoMPSOisr in the case of shrimps, crabs, and 
prawns, can only be approximately normal curves, even if it were possible for the 
ratios to run from O.to 1. But as a matter of fact, the possible range is very 
much smaller. We may not be able to assert, cl priori^ what it is, but for an 
adult prawn to have a carapace f or xq-oo of its body-length, or a man a leg 
3 or 2^- of his body-length, may be regarded as impossibilities ; they are abnor- 
malities, which could hardly survive to the adult condition. Precisely the same 
rem.arks apply to skull indices, and probably to the relative size of all sorts of 
organs in the adult condition. We may not knovv^ the range, a priori, but we are 
quite certain that one exists, and it is a quantity to be determined — just as the mean or 
the standard deviation— from our measurements themselves. We mav take it that in 
most biological measurements of adults there is a range of stability, so to speak, 
organs not falling w^ithin this range are inconsistent with the continued existence of 
the individual, with the assumption that he has lived to be an adult.^ Nor is this 
question of range confined to biological statistics. A barometric frequency curve 
must show the same peculiarity ; there are excessively low and excessively high 
barometric heights which would be not only inconsistent with the survival of any 
meteorological observer, but also with the existing features of physical nature on 
this earth. In vital statistics we find precisely the same thing, a curve of percent- 
ages of mothers of different ages for the children born during any year in a country 
would be definitely limited by the ages of puberty and the climacteric, which cannot 
be pushed indefinitely towards childhood and senility respectively. Again in disease 
and mortality curves, while the lower limit of life is clear, it is highly probable that 
an upper limit exists, if we can only fix it by investigation of our statistics them- 
selves. A man of the present day, as now organised, may be able to live 120 years, 
perhaps, but we have exceeded his vital possibilities if we take, say, 200 years. 

Thus the problem of range seems a very important one, it theoretically excludes the 
use of the normal curve in many classes of statistics ; it is quite true that, for 
many practical purposes, frequency curves of limited range may be sensibly identical 
either with unlimited curves, or even with normal curves, but, in other cases, this 

* Absolute malformations, congenital, or due to post-natal accident are excluded. Abortions or 
amputations would be naturally excluded from our measurements, 
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IS not so, and under any circumstances the limited curve may actually give information 
as to the possible range — the ^' limits of stability ^'- — which is itself of great value. 

We have, then, reached this point : that to deal effectively ivith statistics ive 
require generalised probahility curves ivhich include the factors of shewness and range. 
The generalised curve we have already reached, possesses skewness, but its range 
is limited in one direction only. 

Accordingly, we require the following types of frequency curves :— 

Type I. — Limited range in both directions, and skewness. 

Type II, — Limited range and symmetry. 

Type III, — Limited range in one direction only and skewness. 

Type IV, — Unlimited range in both directions and skewness. 

Type V, — Unlimited range in both directions and symmetry. 

Type V. is the normal curve ; Type IV., with slight skewness, has been dealt 
with by PoissoN in the form of an approximative series.''^ Type IIL has been given 
above, it was first published by me without discussion in ^ Roy. Soc, Proc.,^ vol, 54^ 
p. 331. 

We can now turn to the general problem. 

(11.) A very simple example will illustrate how a frequency curve, with limited 
range and skewness, may be considered to arise. Take n balls in a bag, of 
which pn are black, and qn are white, and let r balls be drawn and the number 
of black be recorded. If r>^7^, the range of black balls will lie between o and p>n\ 
the resulting frequency polygon will be skew and limited in range. This polygon, 
which is given by a hypergeometrical series, leads us to generalised probability 
curves, in the same manner as the symmetrical and skew binomials lead us 
to special cases of such curves. If we consider our balls to become fine shot, or 
ultimately sand, and suppose each individual grain to have an equal chance of being 
drawn, we obtain a continuous curve.t It is not, however, impossible that, could we 
measure with sufficient accuracy, many physical as well as biological statistics might 
be found to proceed by units, much as in certain types of economic statistics we are 
not troubled with fractions of a penny. For this reason we shall keep our results 
in the most general form, and obtain a curve approximating to the hypergeo- 
metrical series referred to without any assumptions as to the relative magnitude of 
the quantities involved. 

We easily obtain for the series giving the chances of 7^*, r — 1, ?^ — 2 ... 0, black 
balls being drawn out of a bag containing pi, black, and qn^ white, the expression 



* " Sur la Probabilite des Jugements," chapter 3. 

t jp pints of red sand and g pints of wHte sand are put into a vessel, and r pints are withdrawn. We 
have if r >p, a perfectly continuous frequency curve for red sand withdrawn ranging between o andjp 
pints. We are here supposing no " perfect mixture " of the two kinds of sand, but theoretical equality 
of chances for each grain, 
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pn (p7i — 1) (pn — ■ 2) . , . (pn — r + 1) 
71 (n — 1) (n — 2) , . , (n — r -f- 1) 

X (l + r ^^^ + ^^ ^^' ""* ^^ ^^ ^^^^ ~ "^^ 



J9??, — r + 1 1.2 (^:?7^ — r + 1) (p??. — r + 2) 

r (7* — 1) (r — 2) qn (qn — 1) (^^ — 2) 

1.2.3 (j97i — r + 1) (p?^ — r + 2) (p^ — r + 3) ^'^ 

If y^ be the s^^ ordinate of this polygon, and we suppose these ordinates plotted up 
at distances c apart, we have 

ys+i ___ 7' -- 6- + 1 qn --S + 1 
ys s p^n ■— r -{- s^ 

Thus 

y^4-i — Vs 2 (r + 1) (1 -^ ^^0 "- ^ 0^ + 2) 



2 (y^ + i + 2/6-) X ^ c (r + 1) (1 + ^^) -- s{2 (r + 1) 4- n (5' — p)} + 2s^ 



2 



(;/• + 1) (1 + qn) -- A^^i-^ - ^^ {n + 2) 



' (r + 1) (1 + qn) -- (?^±i -- J j {2 (r + 1) + ^ (^ -^ p)} -^ 2 ^^-^ -- i^- 
Write 

and we find with our previous notation 



7^ + 2 



A,^ 1 "- X'^+. 



where 

c^ (r + 1) (n — 7^ + 1) (1 + qn) (1 +i?''^0 



i-\ 



/3i = 



(?i + 2)'^ 



C7^ (7^ - 2r) (i? - ^) p -^ 1 



2 (^ + 2)3 '^^ n -h 2 

Now, if we attempt to find the curve which has the same geometrical relation for 
the slope as the above hypergeometrical polygon, we see that it will change its type 
according to the sign of iSg^ — 4/3^/33. 

After some reductions we have 



en 



V(p+ -)('!+ i)-^)(i+V(p+W'>'+'^~ ' 



7^ + 2 [\^ "^ y- ' 71 I K-^ ' n j 7^ / \ ~ " "^ \'~ ' 7i 1 \^ 71 i ?i 
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Hence \/{lii' — ^^il^z) ^^^^ ^^ ^^^^ ^^ imaginary, according as rjn lies outside or 
between tlie limits 



i 



± \/{(p + i)(^i + i)} 



If rjn lies outside these limits, then the integral of the right-hand side of 
equation (e) is purely logarithmic ; if it lies between these limits, the integral is in 
part trigonometrical. 

Since r must be less than n, it follows that the integral must be trigonometrical if 
these limits are respectively = <0 and = >1, i,e,^ if 

{P + 1/^) {^i + lAO = ^1^ > h 



or p must lie between ^ ± A/ \ — ( 1 + 



n \ n 

For example, if n = 100, then, if p lies between '6005 and '3995, the integral must 
be trigonometrical. If p lies outside these limits, say = 7 for example, then the 
integral will be logarithmic if r/n does not lie between '04 and •96, i.e., if we draw 
a small or large proportion of the total contents. 

Let us treat the trigonometrical and logarithmic cases separately. 

(12,) Case! y8/ < i/3^l3^^. 

The curve having the same geometrical slope relation is 

1 
log y = constant — ^^ log (^^ + fic^x + fi^x^) 



tan 



Write ^ for X + A/^i^s, changing the origin ; further put a for ^ {^fii^^ — ^2^]/{2fi^), 

m for 1/(2^3), and 1/ for is~~T^'r^ ^Ti ' ^^^^^ ^"^^ have, j^q being a constant of 

integration, 

^ . y^ ^ - J' tan -1 (;>;/«) 

^ "^ (1 + ^V)^'^ 

This frequency curve is asymmetrical and has an unlimited range on either side of 
the origin. It corresponds accordingly to the curve required as Type IV. 
Here 

a — \csj{i (1 + ffh) (1 + q_n) — {n — 2r)^}, 

n(n — 2t){p — (jj) 
^ "~" v^{4 (1 + pi) (2^ -f qn) — (n — 2r)^} 

m = -|(ri, + 2). 
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Special cases, (i.) Suppose rjn = ^j and n very large, then 



mja^ = -v-7 — r^r == a.. , say, 

en (pq - (I- - xf) '' ■^' 



Thus we have 



via = -7— .1 "- — T^ == a«. say 



dlX^ — a^Q] 



y — y^e "'- "% 

which reduces to the normal type by a change of origin. It is important to notice, 
however, that the standard deviation of this normal type 

and is very different from the value Cy/{{r + l)pq} = \c^ {npq X 4x), nearly, which 
is the usual form. Only when we put p = g = | and make x small do they agree. 
We thus conclude : That the normal form may Jit a chance distribution, but it does 
notfolloio that the standard deviation is of the binomial type generally assumed. 

(ii.) Suppose x = i> corresponding to the withdrawal of one-half of the contents of 
a vesselj then 

where 

This is an unlimited and symmetrical frequency curve approaching more and more 
nearly to the normal form as we increase n. It has, however^ a standard deviation 
= -k^\/iW9.)} while the normal curve would give ^Cy^{npq X 2). 

(iil.) Suppose jP = g^ = |, we again reach the form 



where 



^ = ^0 (1 + -^VO""". 



{n + 2) a/ \ 



ao_ic,.. . ^, .^, ,. ^^^^ 



> * 



Make n infinite and we have again the normal type, but a standard deviation of 
the form ^c^{nx (1 — x)K ^^^ approaching the usual value when x ib small. 

We postpone until we have discussed the remaining types the problem of fitting a 
cur^e of Type IV. to a series of observations. 

(13.) Case IL /3/ > ifi^^^. 

Let ai and % be the roots of ^^ + j8^5c + fi^x^ = 0. Then the curve having the 
same geometiical relation for its slope is 



dm ^g (x — a^ {x — a^) 

^ - ^TTTT^n ^ ^^^ ^^S ^^ " ^i) ~ ^3 log (•'K - %)}, 

3 A 2 
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OFj if 



Ijv = ^3 (ai - a^l 

y = Vo (^ - ^i)""''' G^ - ^h) 



by changing constants. 

Assuming tbat y^, v, a^ and % can take any sign whatever, we see that there are 
three fundamental subtypes of this frequency curve^ 

(i-) 2/ = ^0 (1 + oo/aiY''^ (1 - oc/a^Y'-. 

V 




a. 



^Qir—a. 



fi 



This is an asymmetrical curve with limited range and maximum towards mediocrity. 
As a rule j/aj and va<^ are fractional and the curve becomes imaginary beyond the 
limits ^ = — • a^ and cc = a^. 




Here the ordinate between a? = a^ and x = a^ varies from infinity to zero, and 
resembles the frequency curves given by '' wealth '' distribution or infant mortality. 



(iii.) y = y^ (1 - xla,Y''y{l + xja^) 



'Va„ 




This is an asymmetrical curve with limited range^ mediocrity being in a minimum. 
The disappearance of mediocrity is not a very uncommon feature of statistics ; the 
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^'prevalence of extremes" may appear not only in meteorological phenomena but in 
competitive examinations, where the mediocre have occasionally sufficient wisdom to 
refrain from entering. The type is that of Mr. F. Galton's curve of ^'consumptivity.'^''''' 
The curve contains an interesting number of less fundamental subtypes, 

(iv.) Make ^2=00 in (i.), 

y 




This is the limit to the asymmetrical binomial, which has been already referred to 
in § 8. 

(v.) Make a^ = %, 




a^,—W* — a, 



X 



This is the symmetrical frequency curve of limited range, 
(vi.) Make v negative in (v.), 



y 



(1 - x^la^^T'' 




This is a symmetrical frequency curve, with limited range, and minimum of 
mediocrity. 

(vii.) Put V = pai in (v.) and make a^^ = 00 , 



— ^-j.^ 



This is the normal ciu^ve. 



y = yo ^"''^ 



* 'Natural Inheritance,' 1889, p. 174, 
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(viii.) Put a^ = oo in (ii.), 



y = ?/() (^/^i - 1) 






d*-ar*d' 




as 



oo 



This is an asymmetrical frequency curve^ with an ordinate varying from a^ to oo 
along an infinite range. 

All eight of the above types are included in the single form 



or 






va^ 



if we give positive, negative, or limiting values to the constants. But to do this we 
require to give values to n and r in the expressions for ^^, /J^, and jSg, which are not 
easily intelligible, if we rigidly adhere to our example of drawing a definite quantity 
of sand from a limited mixture of two kinds of sand. The last type of curve given 
is, however, the frequency curve for d priori probabilities/^ and readily admits of a 
direct interpretation of the following kind. 

Given a line of length ?, and suppose r + 1 points placed on it at random ; wbat is 
the frequency with which the point pr from one end and qr from the other of the 

series of r + 1 points falls on the element hx of the line ? 
The answer is clearly 



T 



X \F' 



pr grK I 



1 — 



xXf^ Zx 



I 



I 



or, we have a frequency curve of the type 



yz=:y^xP^'{l ^xjiy. 



We may express the problem a little differently. Take 7^+1 cards and slip them 
at random between the pages of a book, the frequency of the page succeeding the 

pr + 1*^ card is given by the above curve, t 

* See Crofton, "Probability," § 17, ' Encycl. Brit; 

t Tbe important point to be noticed bere is that we are dealing witb a distribation in which 
contributory causes are inter-dependent, 
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Until we know very much more definitely than we do at present, how the size of 
an organ in any individual^ say, depends on the sizes of the same organ in its 
ancestors, or what are the nature of the causes which lead to the determination of 
prices, or of income, or of mortality at a given age, I do not see that we have any 
right to select as our sole frequency curve the normal type 



y-y^G '''' 



in preference to the far more general 



2/ = 2/0 (1 + ^KY"' ("J- - oc/a^y% 

which not only includes the former, but supplies the element of skewness which is 
undoubtedly present in many statistical frequency distributions. As we may look 
upon the former as a limit to a coin -tossing series, so the latter represents a limit to 
teetotum-spinning and card-drawing experiments. It is not easy to realise why 
nature or economics should, from the standpoint of chance, be more akin to tossing 
than to teetotum-spinning or card-dealing. At any rate, from purely utilitarian and 
prudent motives, we are justified so long as the analysis is manageable, in using the 
more general form. It will always give us a measure of the divergence of particular 
statistics from the normal type, and in many cases of skew frequency, it can be used 
when it would be the height of absurdity to apply the normal curve at all. 
Since Types I., 11. , III., and V. are all represented by the curve 

y = yoi^ +oo/a,y'H^ - ooja^Y'' 

and Type IV. by the curve 

1 

n, — ni 1_— . . ^"vim-^xja 

^ "-" ^0 (1 ^ ^^l^^s^n ^ ? 

we have only to deal with these two cases in general We shall refer, in the 
course of our work, to special simplifications arising in particular sub-cases. After a 
description of the manner in which these generalised probability curves may be fitted 
to statistics, we shall indicate, by examples, their practical applications. 

(J 4.) On the Generalised Prohahility Curve. Type L 

2/ = 2/0 (1 + xja^Y'^ (1 ^ xla^y\ 
Let the range a^ -f c% = h ; let m^ = va^, m.^ = va^, z = (a^ + x) / {a^ + %), 
whence a? = — a^ z =z and z = a^, z =z h 
Further let 

thus y = y^z''' ( 1 — zy\ 
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Let a be the area of the curve between .x = — aj^ and x == %, a/x'^ its 7^*^ moment 
round a parallel to the axis of y through x =■ — a^ and a/x^^ its n^^ moment round the 
centroid vertical. 

Then we have 

OLLL ^ — — I iJX CtXy 

Jo 

Jo 
= b^'^^rj B (m^ + ^^ + Ij ^3 + 1)? 

= b^' + \ r (m, + 71 + 1) r Cm, + 1)^ 

' r (^ij + ^3 + '^ + 2) * 

Thus, by the fundamental property of the r function, we have 

a = 677 r {my + 1) r (m^ + 1) / T (m^ + m^ + 2), , 

, _ Ijm^ + 1) , 6^ (7ni + 2) (mi + 1) 

, _ P (m, + 3) (m, + 2) (m^ + 1) 

'^ ^ """ (???! + 7% + 4) (% + oiici + 3) (7??i + 7^13 + 2)' 

, _ _^ 5^ (7?ii + 4) (771 ] + 3) (m^ + 2) (mH: 1) 

r' ^ (m^ H- 7/13 + 5) (mi + 7?^2 + 4) (77I1 + m^ + 3) (7?^l 4- m^ + 2) 

From these we easily deduce by the formulse connecting jx and /x', if we write for 
brevity, m^ + 1 = ^D '^% + 1 = ^^^'g, and t^'^ + ^^'2 ^ '^' ' 

_ ¥ m\ 77/3 __ 2/^^ m\ m'o (m'^ — 7?^\) _ 3&^7/i\ 7?2'g (7?i';^ 77^'3 (r -- 6) + 27-^) 

/^s — :^^'':^i) ' /"^ - -^rpqriy"(7^ ' t^^ ^ ,.i (,. + 1) (,. + 2) (7^ -i- 3) 

Now, a, /X2, /UC3, and jut^ are to be found by the methods indicated in Art. 4 from the 
polygon of observations, and may be supposed known quantities, when we are dealing 
with the fitting of frequency-curve to observations. 

Then, if /J^ = /x^/jlc^^ and /3i == iJ^'^Jii^, e = "^^i "^n^, we have : 

'^i — e (r + 2)3 ' ^^ "~ e (r + 2) (7^ + 3) 

Thus * 

^1 (r + 2)3 r3 ^ y9o {r + 2) (r + 3) 21'^ , 

4 (r + 1) ~" 6 ' 3 (r + 1) e ^ 

whence, eliminating 'r^e, we find : 

r=^ 6 (<g. - A - ] ) 
3iSi - 2/3sj + 6 
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This gives r, then : 



n-ii 



4 + i/3i (r + 2)V(r + 1) ' 
,3 _ /^2 ^'' (r + 1) _ A (r + 2)3 + 16 (r + 1) 



or 



h — ^^^^ ^^1 ^^^ "^ ^^^ + 16 (r + 1)}^ 

2 



m'l and m^ are roots of 



m^ — r^?^' + ^ 



Thus Ml ^ m\ — 1 and m-c^ = m'^ — 1 are deterrained. 

Further, aj + <^2 = &> ^i/% = ^hMs? ^^^ ^^ = mi/aj are all determined. 

Lastly: 

and 

06 := &>; r (m^ + 1) r (m^ + i)/r (% + m^ + 2), 

give : 

__^ a m^""^ m/^^ r (m^ + m^ + 2) 

y^ h (m7+l%)^^^^ r (m^ + 1) r (m7+l) ' 

which completes the solution,^^' if a Table of r functions is to hand, 

Remarhs. — It is clear that the solution is unique. 

It is necessary in order that the solution may be real, that m\ and m^ should be 
real or r^>4e. Hence, if € be negative, there is certainly a solution, because r is 
always real. The solution forms, however, one of the sub-types referred to in our 
Art. 13, (ii) and (iii). 

If € be positive, we must have r^je — 4 positive, or 

A (3 + A)^ ^ ^ ,_ 



(6 + 3^1 - 2/3,) (4^, - 3/3i) 

Now it is easy to prove that for any curve 4^62 — S/S^ or 4.ii4ic^ — Zfi^ is positive, 

for ^^/Ag is always greater than yu^. 

Thus, we must have 

6 + 3A-2A> 0, 

or 

2/^3 (3/x,^-/x,) + 3/^3^ >0. 



* Very often with sufficient accuracy we may take : 






MDCCCXCV, — A. 3 B 
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Now it is theoretically impossible to fit a normal curve (ft^, = 3ft/) to a frequency 
distribution for which jjl^ > 3ijl^^. It is, however^ possible to fit this generalised curve 
of Type L, although [x^ be >3/>t2^ provided there is sufiicient skewness to render 

Hence the first stage in determining the type of curve suitable for a given set of 
observations is to ascertain the value of 

If this expression he positive, ive see that a limited range of variation is a possibility . 
Passing from range to skewness we remark that the distance d between the centroid 
vertical and the maximum ordinate 



(Ml + mg) (m^ + ^^'g + 2) 



Now it miffht seem that d/b would form a efood m63asure of skewness, and it would 
be so if all curves had a limited range. Butj as they have not, it seems to me 
better to take as the measure of skewness the ratio of the distance between the 
maximum, ordinate and the centroid to the length of the swing radius of the curve 
about the centroid vertical, i.e., the quantity dl/yZ/Xg. 

In our case we have accordingly, 



skewness 



m^ — m^ 1 1 m^ + Mc^ + 3 



V 



mi + mg V \{m^ + 1) (m^ + 1)/ 



-3 






in our previous notation/''' 

Thus range and skewness are determined in Type I. 

(15.) A very considerable simplification of the above analysis arises when the range 
is given by the conditions of the problem itself, e.g., guessing between two given tints- 
In this we only require the moments [ju'^ and /x,'^ about one end of the range, and the 
solution becomes as easy as in the case of fitting a normal curve. 

Since &, ft\ and /^'^ are known, let 

* The points of inflexion of the curve are at distances ± \/a^a^7(m^ + m^ -n[) on either side of the 
maximum ordinate. 
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Then 

and we have at once 

7i - 73 7i - 73 

Then a^/aQ = (m\ — 1)1 {m\ — 1), and a^ + <^*3 =^ ^ give a^ and (Xg. Finally y^ is 
given as before by 



3/o 



h {m^ + mgy^i^^'^^ r(mi + 1) r(m3 + 1) 



(16.) A perhaps still more interesting and usual case arises when one end of the 
range is given, ^.e.5 when /x\, but not &, is known. For example, a curve of distri- 
bution of disease with age, the liability to the disease starting with birth. Here we 
require to calculate from the observations a^ /x'^, /x'g and \i<^. The solution is as 
follows : 



Let 



then 



(m\ + 1) (m\ H- m'o) 1 + ^y 
^^ m\ {m\ + m'g + 1) 1 -{- ^t 

_ (m\ + 2) (m\ + ^7i'3) ___ 1 \- 2v 



\iv = ljm\ and ?^ = l/(m\ + m'^). 
Solving 



Thus, 



determines the range. 
Hence, since 



u = ^^^ i^^s r: ^^^^ ^ =: ^%3 - ^2 - X2%3 . 

2(X3 -Xs) ' 2(^3 - %3) 

m' = — -liXi^llLXsL^ ^^^/ _ 2 (X2 "- Xs) (Xb - 1) ( 1 -X2) 
^ ^Xs - Xs - X2X3 ' ^ (1 + X3 - 2%3) (2^3 - Xo - %3%3) 

— ,/ ^ Xs - X2 - X 2X3 






1 + «3 == &, and ai/a^ r=: '^^^^^^^^ ' 



we have, with the aid of the previous expression for 2/0, the complete solution of the 
problem. 

3 B 2 
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(17.) Generalised prohahility curve of Type IL Limited Range and Symmetry. 

2/ = 2/0 (1 - ^'l<^'Y- 

The solution in this case follows very easily from (14) by putting ^^ = 0, we have at 

once 

6 (^, ^ 1) 



or 



2{m + l) = r= ^^2^^^ 
" 2 (3 -. ^,) 2 l^ixi ^ fjL,) 



m 



Since ju,^ = -z~~^ — t\ j and clearly € = ^^4. 

we have 6 = 2a = 2^/(^12 (r + 1)}^ 

or 



a =2 



v/(3 - /5,) ^(df,,^ ^ /.,) 



Fin all V 



Vo 



a m^"^ r (2m + 2) 



3 ^ 



b (2w)5» {r (m + 1)}' 
« ^(3 - ^3) r (2m + 2) 



2 v/(2/.2^2) 23»{r(m+l)}' 






3/^/ - /^4 r (2m + 2) 

2;^,;., 2^»+i{r(m+l)p 



3yit/ - /^, r (m + 1-5) 



2ytt3ytt, v/ttF (m + 1) 

For the normal frequency curve fi^ =■ Zin^, for a symmetrical point-polygon 3/u./ > ju,4. 
Hence, whenever a symmetrical frequency curve differs from the normal curve on the 
side of the point-binomial, we can better the normal solution by taking a symmetrical 
frequency curve of limited range. 



y = y,[^- ,. 



9 \ "^'i r.v. 



lCt2 



and 

m 0/3^ — 9 1 

if /3.2 = 3, we easily trace the transition from the limited symmetrical curve to the 
normal curve with infinite range. 

Quite apart from the extremely interesting problem of finding the range, it is clear 
that better fits will be obtained for symmetrical distributions by the aid of this limited 
range curve for all cases in which 3/x/ > /x^. 
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(18.) Generalised Probability Curve of the Type III, Range limited in one 
direction only. 

y = y^{lJ^ x/ay^e-^y^ 

In this case we have no need to determine the value of /x^, and the analysis is much 
simplified by the replacement of the B function by a single r function. 
Take z ■= y {a -{- x) and write ya = p, we have 

Further, .t = — a,^ = 0, ir = oo . ^ = co . Thus we find 



Hence 



«/x. = f j/^ {x + a)- dx = ^- \z^^-e-^ dz. 



y^^^^ -n /^ I 1 \ ' «r (p + ^ + 1) 



whence 



^^+1 \r I -/. -rn yT(p + 1) 

, _^ + 1 ,_ (J9 + 1) (2^ + 2) 

' - (^ + 1) (F + 2) (p + 3) , _ (29 + 1) (|7 + 2) (j9 + 3) ( p + 4) 

Or, transposing to the centroid- vertical, we have 

^^^±1 .._, 2 (i^ + 1) _ 3(£jJLKi. + 3) 

The first two results give us at once 

r = 2/X3//X3, p = 4/X37,fX3^ - 1, 

whence 

a=~ = ~^- — ■ ^ , and y^ '- — 



This completes the solution of the problem, which is seen to require only the 
determination of /xg and /Xg. 

Remarks. ^ — The distance d of the centroid- vertical from the axis of y or maximum 
ordinate d, is given by 

a =^ l^i a = 'c)i^2,liH' 
Thus 

skewness = d/^/ix^ = -kl^JiJ^i'^' 
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If we transfer the origin to the cent roid- vertical we have 

where 

y^ "" ^/(27^/.,) V{p + 1) 

It is interesting to note how this skew curve passes into the normal curve when 
jLtg is made vanishingly small, or ^ = oo « 

By Wallts's theorem the limit to y^ = a/^27rjjL.2' 
It remains to find the limit of 

L Jw-^O 

Now the limit of {(] + n) e -^^j^/^-' for t^ == is easily found to be e '~^, hence 

the normal form. 

Returning to the value we have found for /x^ and ehminatingjp and y between ii^, 
[Xq, and fx^ we find 

2/^2 (3/x/ -» /X4) + 3/X3^ =:= 0. • 

This is the expression (see p. 398) which must be positive in the case of Kmited 
range„ It is zero also for the normal curve, because both 3/x^^ - /x^ and /Xg vanish. 
Hence the more nearly the quantity 2/X3 (3/x/ — /x^) + 8/^,3^ approaches to zero, the 
more nearly are we able to fit our statistics with a skew frequency- curve having 
a range limited in one direction only„ 

(18 bis).— The skew frequency-curve of Type III. deserves especial notice. It is 
intermediate between those of Type I. and Type IV., and they differ very little from 
it in appearance. Hence, if the reader has once studied the various forms which 
Type IIT can take as we alter its constants, he will grasp at once the forms taken by 
Types I. and IV., by simply considering the range doubly Hmited or doubly unhmited. 
To assist the process of realising Type III., Plate 9, fig. 5, has been constructed ; it 
contains seven sub-types of this species, varying from fig. i., in which the curve is 
asymptotic to the maximum frequeDcy-ordinate to fig. vn., which is practically 
identical with the normal curve. Taking y =: y^ (1 -{■ x/ ay e'"'^'''^'' for the equation 
to the curve, we have the following values for the constants j9i and y :-- 
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y' = -3 

■ = -2505 



I. 


^ — — 


— -67 


11. 


p =c 


•001 


IIL 


p = 


•265 


IV. 


p = 


1-021 


V. 


p = 


1 


VI. 


p = 


6-5625 


^TI. 


p = 


1890 



y — -363 

y — 7676 

y= -5 

y — 4'3125 
y ^ 1700 

In the diagrams vertical and horizontal scales (;?/q and a) have been chosen so as 
to illustrate best the changes of shape in the curve. The general correspondence of 
this series with actual types of frequency curve, as indicated in Plate 7, fig. 1, will at 
once strike the reader. 

The mean, tine median, and the mode or maximum-ordinate are marked by bh, cc^ and 
aa, respectively, and as soon as the curves were drawn, a remarkable relation manifested 
itself between the position of these three quantities : the median, so long as p was 
positive, was seen to be about one-third from the mean towards the maximum. 
For p negative and between and — 1, this relation w^as not true. The distance 
between the maximum-ordinate and the mean is, if the equation to the curve be 

y = y^x^ e-''^ 

equal to l/y. Now the maximum cannot be accurately determined from observation, 
but a fair approximation can be made to the median. Hence the constant y could, if 
the above graphical relation were shown to be always true, be determined approxi- 
mately as the inverse of thrice the distance hetiveen median and mean. 

Now distance of mean from origin = 0^ + 1 ) / y ? 
and ,, maximum „ =z p / y. 

Hence, supposing distance of median = {p-{-o)ly, we should expect to find 
G = 2/3 about. 

Equating the integral which gives the area up to the median to half the total 
area, we have 



00 /.CO 



^0 

y 

or, « 



fuu /« (jy 

a^^^e""^^ dx ^=^ |-2/o x^'eT"^^ dx^ 



00 /. 00 



fuu /• uu 

zPe"' dz = \\ zPe"' dz. 
p + c J 



This is the equation for c. Unable to solve it generally I gave p a series of integer 
values and found in all cases c nearly '67. Its value, however, decreased as p 
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increased. I, therefore, assumed c to be really of the form g ^=- c^ + cjp, and deter- 
mining Ci and Cg by tlie method of least squares, found 

c = °6691 + •0094/p« 

Probably this is only the beginning of a rapidly converging series in inverse powers 
of p, but it would appear to suffice for most practical purposes. It is only true for 
p > 1 and does not explain why, when p is positive and fractional, c is still apparently 
near f ; thus its value fovp = has only risen to "6931. We have then the following 
fairly simple means of determining roughly the constants of a skew curve of this type: 

(1.) Find the mean and the median ; these gives y, approximately. 

(2.) Find /x^ for the mean ; this gives p, since ^a^ =::= ( j9 + l)/7^. 

(3.) Knowing p, correct the value of y by using the above value for c, and so obtain 
a corrected p. 

(4.) Determine y^ from the area. 

This method is not very laborious and may be of service in some cases/'' It will, 
of course, fail for any curves in which p is negative, and must only be applied when 
the curve is known to be of Type III. If the beginning of the range is definitely 
known, we may save stage (2) above and find p) from the distance of the mean from 
the start of the range. 



(19.) Generalised Prohability Curve of Type IV. Range unlimited, but form skew. 



,,. '^0 p —V tan— 1 (xla) 

'^ '^ {1 + {xlcifY' 



Put X :==:. a tan d, hence 



■1/0 



f+ 00 r7r/2 

yx'' dx = y^a''^'^\ cos^^^' -n-^iQ ^-^^iQ ^-ve ^q ^ 
— 00 J — ■jr/2 



y z=z y^ cos^^'^ 6 e 

r/2 
rTr/2 

= Vo^^'"'^'^ GOB' ^''6 sin'^ e-^'^ dO, if r =:^ 2m — 2, 

^"''"^' ^(w-l)P cos'--''' + '^sin"-^6'e-^''(^^- vp cos'-''+'6'sin''-'^e-^^rf(9 

-7t/2 



r — n -i- 1 



a 



r •— ?^ + 1 



■7r/2 

(n — 1) aij.\i^2 -^ vp!u^i \^, 



provided r > n — 1. 



* The points of inflexion may also occasionally be found from the observationn ; they are at distances 
=t ^/^/7 on either side of the maximum ordinate. 
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Thus, if we know a and /x'^, we can find the successive fi^^s. Now 



a 



J -7r/2 



'TV 



= ;z/oae~ ^"^ sin*'6> e-^'^ d9, 

JO 

and depends on the integral siifOe'^ ""^dO, which I propose to write G (r, p). The 

Jo 

cos^'^ sin*?^ e'^''^ d0 can 



always be expressed in terms of G-functions. Further : 



a/^ 1 = y^a^ 



'ttI2 



cos'-Wsinee^'^de. 



-7r/2 



y(ja^v ^""^^ 



cos'9 e~^^ d9 



av 



OL. 



-7r/2 



r 



Thus we find by the formula of reduction above : 



r'^ 7^(r— 1)^ ' ^' ^^ r(r — l)(r — 2) "^ ^' 



a-^ 



/^4 = 



r (?^ - 1) (r - 2) 0' - 3) 



- {3r (r - 2) + z.^(6r «- 8) + 2^^'}. 



Referring to centroid vertical, we have ; 



/^3 



H 



^2 (r HT) V + ^ /' /^3 -- - ^,3^; __ 1^ (^, _ 2) ' 

3a^ (r^ 4- v'-) {(r + 6) (r^ + p^) - 8r^} 
^^4 ^^, _ 1) (^ _ 2) (^> - 3) 



These may be rewritten, if ;<; = r^ + -^^ 



ah 



t^2 



r^ (r -— 1) ' 



/^3 



4:ah Ay(z — r^) 
r^ (r — 1) (?^ — 2) 



— 3^^^ {0^ + 6) ;^ - 8r^} 
/^^i^ — r*(r - 1) (r - 2) (r ~ 3) ' 

As before, putting fii = iJi^HiJ^i' and (^^ = iJ^JiJ^i^ we have 



_ A (^ - 2 )^ ^ 

2 (r - 1) 

IB, {r ~ 2) (r ^ 3) _ 



r 



3 



8 ": 8, 



3 (r - 1) 



r + 6 - 8 



r 



MDCCCXCV. — A, 



3 C 
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Adding and dividing out hj r — 2, we have 

6 (/3, ^/3,^1) 



r 



2/3. - 3^1 -- 6 



» ? 



hence 



is known. Further 



m := 1 {r + 2) 






% 



is known, whence 



is given/'^ Finally 



1 ^ A 0: - J)! 

16 r — 1 



2/ = ^(z —. r^) 



„^^ . .,*(.■-!)' 



V' 



^ 



and 



^eiw 



2/o 



TT 



(ft I sin^" B 0^^ cl6 





completely determine the problem. 

Remarks, The solution is clearly unique, 

(i.) To determine the skewness we must find the position of the ordinate for which 
dyjdx = ; this is x^ = — val{2m) = — va/{r-{-2). 

But 

y , va vet 2va 

' i • ^ ^^ ^. _|. ^ .;. (^^ _|- 2) 

xiLeiioty 

skewness = d/^ix.^ 



]^2 V(r^ +i') = i^/^l h:|- ^'f- P- 2^^)- 



(ii.) We further notice that 

4:pg -^ ^H-X 



r — 1 



2/3^ -^ 3ft - 6 



HencOj since 4^^ ^^ always > SjS^ (see p, 369)^ it follows, since r > 1^ that we must 
have 

2^3 — 3^1 -- 6 > 0, 
or 

* Whether we give v the — or +- sign will depend upon the sign of ug in the actual statistics. 
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Thus this expression is again critical for the class of curve with which we are 
dealing. We may say that a skew frequency curve will have limited range, range 
limited in one direction only, or unlimited range according as 



3 



2/Xjj (3/^/ — /x^) + 3/^3 

is greater than, equal to or less than zero. Thus the calculation of this expression is 
the first step towards the classification of a frequency curve given by observation. 

(iii.) It is noteworthy that the values we have obtained for r, z, a, v and y^ will be 
real and possible if r > 1. On the other hand we have required in our work that r 
should be > 3. I propose now^ to return to this point. So long as r > 1 the values 
of both il\ and /x^ wil] be finite, but the values of /x'g and iJi\ and consequently of /X3 
and JLC4 will be infinite if r be < 3. That is to say, the third and fourth moments 
of the curve about the centroid vertical become infinite. This is quite conceivable 
from the geometrical standpoint, and various interesting questions, of purely 
theoretical value however, arise according as r > 1 and < 2, i.e,, [jl^ and /X3 are both 
infinite, or r > 2 and < 3, i.e., [x^ alone is infinite. The solution we have given fails 
in these cases. We should obtain, however, finite relations between the four constants 
of the equation to the curve by taking the first and second moments afi'^^ and a/x''^ 
round the axis of x; we find in this case 



aii'\ = ly^^a rcos^''+^^ e"^'' dO, 

J~7r/2 

aij.\ = i^/o^ a rcos3^'+*^ e-^^' dd, 

J-7r/2 



or, 

/x^ = ly, 6-^- G (2r 4 2, 2p)/G{r, v), 

l^\ = W <^~'^ G (3r + 4, 3z.)/G (r, v\ 
These results together with 

/^2=7^(7Ziy. a = 2/oW""^" G (r, i;), 

are theoretically sufficient to determine the four constants r, >, y^ and a. Practically 
they would hardly be of service without very elaborate tables of the G functions. 

As a matter of fact, we are very unlikely in dealing with actual statistics to meet 
with cases in which /Xg and /x^ become infinite, because neither the range of observa- 
tions, nor the size of the groups observed at great distances from the origin can be 
infinite. With finite values of /xg and /X4, it is, however, easy to see that we always 
obtain from our solution on page 377 a value of r>3, so that the solution is self- 
consistent. 

3 c 2 
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(iv.) It remains to say a few words about the integral 



•tt 



G (r, v) = sin*-^ e"^ d9. 

Jo 

Provided r > 1, w^e find a formula of reduction 

Thus the value of the integral from r = to r = 2 only will be required for diverse 
values of v. The integral does not yet appear to have been studied at length or 
tabulated. Dr. A. R. Forsyth"^ has kindly answered my inquiry for a fairly easy 
method of reducing G (r, v) for purposes of calculation, by sending me the formula 

where n is Gauss's function su.ch that 

U{n) = V {n+ 1). 
Taking as definition of 11 that 

1 9 n^ 

n (2^) = limit or ;^ -_--.---™--^- — -— - — ^ ^^ 

when n is infinite, we can reduce the above expression to the form 



G (r, v) = 



2-^7re-l^^ T(r + 1) 



Product ( 1 + 



n^i \ 4,%^ (1 -h rln)^ 



Here, since r can always be supposed to lie between and 2, when p is small a few 
terms of the product will generally suffice for the calculation of G (r, v) to the degree 
of accuracy required in statistical practice. 

On the other hand when ?' is large, i.e., generally in cases of slight skewness^ I find 

if tan (f) = v/r 

very nearly. 
Hence 



^ — -- <pr tan (p 



a /re ^^ ^^r 



•"o — a 'V 27r (cos ^)'+i 

very nearly. 

* " Evaluation of two Definite Integrals," ' Quarterly Journal of Mathematics,' January, 1895. 



MR. K. PEARSON ON THE MATHEMATICAL THEORY OF EYOLUTION. 381 

(20.) We have now considered methods for fully investigating whether a given 
system of measurements has a hmited range, and for ascertaining the degree of 
skewness of the system. 

Analytically, our work may be expressed as follows :— 

The slope of the normal curve is given by a relation of the form 

1 cly X 



y clx a 



1 



The slope of the curve correlated to the skew binomial as the normal curve to the 
symmetrical binomial is given by a relation of the form 

y (a/A/ C-t ~f" CaM 

Finally, the slope of the curve correlated to the hypergeometrical series (which 
expresses a probability distribution in which the '' contributory causes " are not 
independent, and not equally likely to give equal deviations in excess and defect) as 
the above curves to their respective binomials is given by a relation of the form 



1 cly — X 



.3 



• 



y ctx C-I ~r* CqX ~\~ Cod, 

This latter curve comprises the other two as special cases, and so far as my 
investigations have yet gone practically covers all homogeneous statistics that I have 
had to deal with. Something still more general may be conceivable, but I have 
hitherto found no necessity for it. 

To demonstrate its fitness and the importance of these generalised frequency 
distributions for various problems in physics, economics, and biology, I have devoted 
the remainder of this paper to the consideration of special cases of actual statistics. 



Part II. — Statistical Examples. 

(21.) QuETELET, who ofteu foreshadowed statistical advances without perceiving 
the method by which they might be scientifically dealt with, has treated of the subject 
of limits in Lettre XXTI of his '^ Lettres sur la Theorie des Probabilites " (1846). He 
seems to have been conscious that certain variations in excess or defect might 
biologically or physically be impossible, and he accordingly introduces the terms Limites 
extraordinaires en 'plus et en moins to mark the range of possible variation. He 
makes no attempt to show how this range may be found from a given set of statistics. 

" Lorsqu'on suppose le nombre des observations infini, ou pent porter les ecarts I, des 
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distances egalement infinies de la moyenne, et trouver toujours des probabilites qui y 
correspondent. Cette conception mathematiqiie ne pent evidemment s'accorder avec 
ce qui est dans la nature. . . . Les limites extraordinaires au dela desquelles se 
trouvent les mo7istriiosites, me semblent plus difBciles a fixer." 

Indeed Quetelet's attempt to fix these limits in the case of the height of human 
beings at 2*801 and *433 metres is purely empirical^ and scientifically worthless. 

I propose in this the first section of the practical part of this paper to consider how 
far the theory we have developed in the first part, enables us to find the range in 
various groups of physical and biological phenomenao 

Example L The Range of the Ba^wneter,— The following results for the curve of 
barometric heights are given on p. 352, 

a := 171-6 11^2= lO'l^!^ 

/Xq — 15-95 /x.^, ^ 326-34. 

We have accordingly : 

2/^3 (3/x/ — /x^) + 3/^3^ = 400-581, 

that is, this expression is positive, and we have a limited range. 
We have further : /3^ = '24401, ^8^ = 3-1739L 
Hence, determining the constants in the manner described in §14, we have : 

r = 30'1382 €=150*7954 
h = 43*61016, 

mi = 5*3352 a^ — 8*2688 

mo = 22*8030 a, ~ 35*3414. 



Next to find d, giving the distances of the centroid from the origin, or the distance 
on barometer between mean and maximum, we have by p. 370 

d= -- *8983. 

Thus 

Range of barometer above mean = 9*1671 

„ „ below ,5 ^=^ 34*4431. 

Now, in the scale upon which our curve is drawn in Plate 10, fig. 6, each centimetre 
equals yo inch, and the mean barometer in Dr. Venn's results equals about 29'^*931. 
Thus the maximum possible = 30''*85 and the minimum possible = 26"*49 ; the range 
of the barometer being about 4''*36. Now, the highest barometer in Dr. Venn's record 
=: 30''*7j and the lowest 28''*7; it is clear, therefore, that we reach much nearer in 
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practice to the upper than to the lower limit of the barometric range/^' The result 
here obtained for the barometric range is of course only tentative and approximate. 
Far larger statistics must be dealt with, and for a greater variety of places, we shall 
then be better able to judge how far the range, as ascertained from Dr. Venn's 
statistics, is local, or if general, what modification or correction may be required. 
Calculating the value of ^q, we find for the curve of barometric heights : 



y — 21-642 (1 + ^/8-2688f '" (1 - ^/35*3414)''" 



8030 



This curve is traced on Plate 10, fig. 6. It will be seen to be extremely close to the 
observations. 

Although the expression 2/X3 (3/x/ — fi^} + S/xg^ is not zero, it is interesting to see 
with what closeness the skew curve which is the limit to a point binomial can be 
fitted to the barometric observations. This is the curve of Type III. Calculating- 
its constants by aid of § 18, we find 



2715^ 



y = 22 (1 + ^/12-1063)''''^' er^' 

while d^ the distance between the maximum ordinate and the centroid -vertical, 
= '7864. This gives a maximum possible height of the barometer of 31''*22 instead 
of 30''*85, there being of course no lower limit. The curve is shown in Plate 10, 
fig. 6, and will be seen to give a very close correspondence with the observations. 
The ^^ skewness " of barometric results as given by the curve with limited range 
= •8983/3-184 = '2821, and as given by the curve of Type III. = 7864/3"184 
= '2470, — no very great difference. 

The areal deviations of the two curves are almost exactly the same, being about 
7*1 sq. centims. or percentage error of 4*1. The normal curve is also drawn on 
the same plate. It diverges widely from the observations, the areal deviation 
= 26 sq. centims. or the percentage error 15*1, — about 3*7 times as great as in the 
case of either skew probability curve. 

Till a wider range of barometric observations have been analysed, it may be wiser 
not to draw too definite conclusions from the above results, contenting ourselves with 
the remark that the new skew curve gives far better results than the old normal 
curve of errors. 

* I am unaware if Dr. Venn's results are reduced to sea-level. The lowest recorded barometric 
heiglit for tlie British Isles reduced to sea-level is 27" '333 (at Ocbertyre, Perthshire, January 26, 1884) 
and the highest (at Roche's Point, Cork, February 20, 1882) is 30"'93. A statement that the barometer 
stood at 81"'046 at Gordon Castle, in January, 1820, has hardly sufficient evidence. Supposing Dr. Venn's 
statistics to be unreduced Cambridge statistics, the expression theoretically found for the barometric 
range seems to be on the whole satisfactory. I have at present in hand other series of barometric 
heights. 
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Example II, Professor Weldon's Crah Measurements No, 4. The details of 

these are given in ^PhiL Trans./ vol. 185, p. 96« 

We have 

a = 999, /ig = 7-6759, /Xg ^ 3-4751, 

/^^ = 184-3039, ^1 — -0267022, ^^ =: 3-12807. 
In this case 

2i^2 (¥/ - /^4) + ¥3^ == l^i (6 + 3^1 - 2^2) — ^ /.i/ X -1760334, 

and is accordingly negative. In Example L of the barometric heights we had 

2/X3 (3/x/ — ix^) + 3/X3^ = /x/ X %3842L 

Since, in the latter case, this value was sufficiently small to give a good curve of 
Type III., we may expect the like result in this case. There is, indeed, a slight but 
sensible skewness even in this the most symmetrical of all Professor Weldon's crab 
measurements,, and the skew curve of Type III. is really a better fit than the 
normal curve. But clearly since the critical function is negative^ we are dealing 
properly with a case of a curve of Type IV. The ratio of the organs dealt with in 
No. 4 series of measurements does not give a ^' limited range '' of variation. Pro- 
ceeding by the method indicated in § 1 9, we find for the constants 



r= 71-624, 
a= 21-909, 
d— -21407, 



m 



Skewness 



36-812, 
7-8802, 
^077267, 



p= 25-7616, 



2/0 



1-75509. 






Thus the equation to the curve is : 

y=: 1-75509 



g- 25-7616 tan-i fa/21-909) 



To trace the curve, take : 



x= 21-909 tan 6^, 

y = 1-75509 cos^^'^^^^e-^^''"*"'. 



If we take a skew curve of Type III., we find for its equation : 



where, for the centroid 
and the skewness 



y= 144-22 (1 4- x/SS-GSSy^'' e' 

d = -226364, 
= -081704. 



4-4n66j; 



For the normal curve we have : 

y = US-85 e-'^'/'-'-™''^'-. 



^' t/Q was calculated by aid of tlic approximate formula on p. 380, 
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All three curves are drawn in fig. 4 of Plate 8. It will be seen that they are all 
very close to the observations. So far as ske wness is concerned, curves of Types III. and 
IV. give practically the same result ('082 and '077) ; in both cases the skewness is 
small. The areal deviations are in the three cases respectively : 4*4 sq. centims., 
5*9 sq. centims., and 6*7 sq. centims., or we have mean percentage errors in frequency 
of 4*4, 5*9, and 67 nearly ; the percentage error for the closest point binomial is 10*5. 
We thus conclude that even in a case which has been selected as the most typically 
symmetrical series of measurements out of a very considerable set of careful statistics, 
the generalised probability curve is one-third as good again as the normal curve, 
while the special case of that generalised probability curve— which is not the most 
appropriate to our observations — is itself distinctly better than the normal curve. 
This result has been confirmed by a considerable application of these generalised 
curves ; in good cases of normal curve fitting, the generalised curves are always 
sensibly better ; in cases where the normal curve is almost useless, as in the case of 
barometric observations, the new curve, if of the appropriate type, will represent with 
a 4 to 5 per cent, mean accuracy many observations not yet reduced to statistical 
theory. It is, perhaps, unnecessary to repeat that this mean percentage is much less 
than the average of what has been allowed to pass muster hitherto in both physical 
and biological measurements. Professor Edgeworth's view"^ thus seems untenable ; a 
curve with a comparatively easy theory of its constants has been found which excels 
the accuracy of the hitherto adopted normal curve. And this for the simple reason 
that it would pass into the normal curve, if that curve were itself the best fit. 

23. Example ///.—The following statistics of height for 25,878 recruits in the 
United States Army, are given by J. H. Baxtee, * Medical Statistics of the 
Provost-Marshal-Generars Bureau/ vol. 1, Plate 80, 1875. 

78-77 2 

77-76 6 

76-75 9 

75-74 42 

74-73 118 

73-72 343 

72-71 680 

71-70 1485 

70-69 2075 

69-68 3133 

68-67 3631 

67-66 4054 

66-65 3475 

65-64 3019 

* ' PHI. Mag.,' vol. 24, p. 334, 1887. 
MDCCCXCV. — A. 3 D 



64-63 


1947 


63-62 


1237 


62-61 


526 


61-60 


50 


60-59 


15 


59-58 


10 


58-57 


6 


57-56 


7 


56-55 


3 


55-54 


1 


54-53 


2 


53-52 


1 


52-51 


1 
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I find : 



Mean height = 67''-2989. 
Standard deviation =: 2'''5848. 
Maximum ordinate^ 3994'04. 



This gives a very close-fitting normal curve. 
The data for a generalised curve are 



^g= 6-68122 
ix.^= — 1-31168 
/x^= 135-02324 



/3^ =z -005769 
/3. — 3-024801 



Thus, 



2/3^ ~~ 3^1 -~ 6 =^ -032295; 



and being positive, we see the curve belongs to Type TV. There is, thus, exactly as in 
the previous examples of crab measurements, no range of a limited character for these 
statistics of height."^ For a true normal curve, fi^, fi^ ought to be and 3 respec- 
tively ; we have therefore a still closer approach (3-025) than in the case of the crabs 
(3*J28) to normality. In this case r is about 400, and on any reasonable scale, there 
is no sensible difference between the normal and the generalised curves. The skew- 
ness is very slight, = -038 about, or about half its value in the case of the crabs. 

24. Example IF. — Height of 2192 St, Louis School Girls, aged 8.— The foUowino- 
statistics are given by W. T. Porter, "The Growth of St Louis Children/' ' Trans, 
of Acad, of Sci. of St. Louis/ vol. 6, p. 279, 1894. 



Heights at intervals of 
2 centims. 


Kumber. 


Heights at intervals of 
2 centims. 


JSTumber. 

1 


centims. 
141 and 142 
139 „ 140 
137 „ 138 
135 „ 136 
133 „ 134 
131 „ 132 
129 „ 130 
127 „ 128 
125 „ 126 
123 „ 124 
121 „ 122 


1 


1 

5 

10 

21 

28 

79 

138 

183 

243 


centims. 
119 and 120 
117 ,, 118 
115 ., 116 
113 ,, 114 
111 „ 112 
109 „ 110 
107 „ 108 
105 „ 106 
103 „ 104 
101 ,, 102 
1 99 „ 100 


342 
321 

297 

222 
137 

84 

42 

27 

8 

2 

1 



The following are the calculated values of the constants t :- 



* If, notwithstanding, wo take a curve of Type II L, we find the range limited on the * dwarf ' side 
at about '7645". 

t The unit of all these constants = 2 centims., except in the case of the mean height. The 
standard deviation = 5*55244 centims., which gives a probable deviation of 3'745 centims. The mean 



M2 = 


7-70739, 


H-a — 


2-38064, 


IH — 


192-17419, 


^1 


-0123784, 
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Mean height == 118'27l centims., 
Standard deviation = 277622, 
y^ for.noi-mal curve = 314*99, 
/33--=z 3-235045. 

Thus 2^2 "~" 3/3| — 6 is positive, and the curve is again of Type IV. 

We have 

d — -135606, Skewness = '04885, 

r = 30-8023, m= 16-4011, 

V = 4-56967, a = 14-9917, 

y^ = 235-323, 

or, for the equation to the curve :— 

x= 14'99l7tan6>, 

y = 235-323 cos^^'^^^^^ ^-4-56967^^ 

the axis of x being positive towards dwarfs and the origin 2-2241 on the positive side 
of the centroid-vertical. 

The maximum ordinate = 324*18 and occurs at x = — 2-0884. 

The curve of Type IV., together with the normal curve, is drawn (Plate 10, fig. 7). 

If we attempt to fit a curve of Type III., we find p about 322*14, and the range 
limited on the dwarf side at about 99-812 centims. from the mean, or at a height of 
about 18-5 centims. The largeness of p causes this curve to coincide with the normal 
curve to the scale of our diagram. The areal deviations are for the curve of Type IV. 
and for the normal curve 6-1 and 8*3 centims., giving percentage mean errors of 5*56 
and 7 '6 6 in the ordinates respectively. The advantage is again on the side of the 
generalised curve. It will be seen at once that the normal ciirve by no means well 
represents the number of girls of giant height. The bheoretical probability that 
these giants should occur is small, and their actual redundancy over the numbers 
indicated by the normal curve suggests some peculiarity in this direction ; it is fully 
met by the curve of Type IV. The asymmetry of the curves given by anthropo- 
metrical measurements on children has been noted both by Bowditch^'' and Porter, t 
but in their published papers, to which I have had access, they do not give their 
raw material, only the ogive curve arising from Galton's method of percentiles. 
Unfortunately, theoretical evaluation of the skewness of anthropometric statistics 
can only be applied or verified when we have raw material, and not integral frequency 

heiglit and probable deviation, as given by Mr. Poeteb, are 118'36 and 3'698. The latter is obtained 
from tlie mean deviation, but I do not knov7 bow the former is to be accoanted for. 

* ' Growth of Children, studied by GtALTOn's Method of Percentiles.' Boston, 1891, p. 496. 

t ' Growth of St. Louis Children.' St. Louis, 1894, p. 299. 

3 JD 2 
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curves, the integral of the frequency in all suggested forms of the frequency curve 
being not expressible in terms of undetermined constants. ValuaHe as is the 
method of percentiles for representing popularly the numerical facts of anthro- 
pometry, it is to be regretted that percentile statistics are replacing the raw material 
in so many publications. The raw material of Professor Weldon's crab-measure- 
ments and BowDiTCH and Porter's child-measurements ought to be preserved and 
circulated in print, as a means of developing and testing statistical theory. 

(25.) Example F. Length- Breadth Index of 900 Bavarian Skulls. — The following 
statistics are taken from Tables I.-VL^ VIII.-X,, inclusive, of J. Rankers ^Beitrage 
zur physischen Anfchropologie der Baiern, Miinchen, 1883.' They include all the 
material, which may be treated as typically " Alt-Baierisch," both male and female 

SKUliS. 



Index. 


Frequency . 


Index. 


Frequency, 


Index, 


Frequency. 


70 


1 


80 


71-6 


90 


10 


71 


1 


81 


82 


91 


8 


72 





82 


116 


92 


3 


73 


2-5* 


83 


98 


93 


1'6 


74 


1-5 


84 


107 


94 


2 


76 


8-5 


85 


82 


96 


1-6 


76 


12-6 


86 


74 


96 





77 


17 


87 


68 


97 





78 


37 


88 


34-6 


98 


1 


79 


65 


89 


19 


99 






We find, as before, 



Position of centroid-vertical, 83'07lll, 



0- =z 3*468, 
fjL^ = 12-027166, 
jutg = 3-707179, 
^^=: 527*91696, 

d= -111388, 



2/^= 103*532 (for normal curve), 
^^ = -0078995, 
/3^ = 3-649553, 

r= 12-42734, 
Skewness= -0321186, 



m = 7*21367, v = *853,77l, a = 11*69583, y^ = 107*4706. 

Thus we see that the curve is again of Type IV. This result seems of considerable 
significance, but it requires, of course, wider examination of cases than I have yet 
been able to make. But, so far as I have gone, in both anthropometric and 
biological statistics, whether relative or absolute measurements of organs, the 
frequency curves all deviate from the normal curve-— however slight the deviation — 
in the direction of Type IV. That is to say, the distribution of chances upon which 
the frequency of variation of an organ depends, appears to resemble the drawing of a 



^ Indices such as 73*6 bare been divided between 73 and 74 groups. 
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limited amount from a limited mixture. So far as this goes, it is evidence against 
the nsual hypothesis that in biological matters the chances of deviations on either 
side of the mean are equal, and the *' contributory causes" independent and 
indefinitely great in number. Thus we appear in biological statistics to be dealing 
with a chance system correspondingj not to a binomial, but to a hypergeometrical 
series, such as that discussed in § 11. 

If it be remarked that Type IV. dismisses at once the problem of range from 
biological investigations, we must notice that, while this is theoretically correct so 
long as we are dealing with the continuous curve by which we replace the hyper- 
geometrical series, it is not true the moment we fall back from the curve on the point 
series (see p. 361). If the r of that page (or the qn) be an integer, the series is limited 
in range. It seems very possible that discreteness, rather than continuity, is charac- 
teristic of the ultimate elements of variation; in other words, if we replaced the curve 
by a discrete series of points, we should find a limited range. It is the analytical 
transition from this series to a closely fitting curve which replaces the limited by an 
unlimited range. Exactly the same transition occurs when we pass from the sym- 
metrical point binomial to the normal curve. Thus, while Type I. marks an absolutely 
limited range, the occurrence of Type IV. does not necessarily mean that the range 
is actually unlimited."'^ 

For the equation to the curve we have 

x= 11-69583 tan ^, 

y = 107-4706 cos i^'^^^s^^ ^-mmm^ 

the origin being at a distance '803515 on the positive side of the centroid vertical. 

The normal curve as well as the curve of Type IV. are shown (Plate 11, fig. 8). The 
result in both cases is quite good for this type of statistics— ^'.e., the skulls came from 
eight different districts and include 100 female skulls. With the planimeter the ai'eal 
deviation in both cases =6*8 square centims., giving in either case an average per- 
centage error of 7*56. That the generalised curve does not in this case give a 
decidedly better result than the normal curve I attribute to the heterogeneity of the 
material. It clearly accounts better for the extreme dolichocephalic and brachy- 
cephalic skulls than the normal curve. The same 900 skulls have been fitted with a 
normal curve by STiEDA,t but neither the constants of his normal distribution nor 

* I reserve for the present the fitting of hypergeometrical point series to statistical results. The 
discussion is related to curves of Type IV., as the fitting of point binomials to curves of Type III. It 
will, I think, throw considerable light on the nature of chance in the field of biological variation, 
especially with regard to limitation of the material to be drawn upon, to which I referred a,bove, and 
which, I believe, finds confirmation in skull statistics. 

t "Ueber die Anwendung der Wahrscheinlichkeitsrechnung in der anthropologischen Statistik," 
' Archiv f iir Anthropologic,' Bd. 14. Braunschweig, 1882. 
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his plotting of Ranke's observations agree with mine. He has added together under 
83, for example, all indices from 83 to 83'9» Thus, for the indices 81, 8 2, 83, 84 he 
gives the frequencies 106, 92, 111, 99, while I find 82, 116, 98, 107, a very sensible 
difference."^ Stieda's method can introduce very sensible errors. In this particular 
case it transfers the maximum' frequency of observation from 82 to 84. 

The last four examples have dealt with cases where the statistician has hitherto 
been content to assume symmetry. They have been given to indicate (i.) an 
apparently uniform trend in biological statistics of variation, and (ii.) the improved 
fitting of theory to practice which arises from using the generalised curve. I now 
pass to cases of obvious skewness, where the statistician has hitherto had no satis- 
factory theory. 

(26.) Example VI. Distribution 0/8689 Cases of Enteric Fever Received into the 
Metropolitan Asylums Board Fever Hospitals, 1871-93. 



Age. 


NuDiber of cases. 


Age, 
35-40 


Kumber of cases. 




Under 5 


266 


299 




5-10 


1143 


40-45 


163 




10-15 


2019 


45-50 


98 




15-20 


1955 


50-55 


40 




20-25 


1319 


55-60 


14 




25-30 


857 


Above 60 


13 




30-35 


503 









1 considered that tlie 13 cases '^ above 60 " might be distributed as follows : 60-65, 
8 ; 65-70, 4 ; 70-75, 1. 

Taking five years as the unit I found 

/x^ =1 4-070554, /X3 == 7^598196, /x^ =69*379605. 

The centroid-vertical is at 18*9691 years, ^.r?,, ^29382 unit from 15-20. 
Thus 2/X2 (3/x/"— ^4)+ 3/x3^= 13*05102, or the curve is of Type I. Since, however, 
SySj — 2/3^ + 6 = *1935 is small, a curve of Type Til, will also be a good fit. 
We have for the other constants 



r = 


= 72-28642, 


e - 


- 259-78912, 


h = 


= 77-28312, 


m-i = 


2-79291, 


Cly - 


3-07801, 


Va = 


= 1890-83. 



d 

Skewness 



a 



3 



'98643, 

*488922, 

67-49351, 

74-20511, 



* I class as 83 all from. 82*6 to 83*4, dividing 82*5 between 82 and 83 evenlj, and 83*5 between 83 
and 84 evenly. Thus in the Table above certain frequencies will be found -with such values as 12*5 or 
71*5 skulls. 
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Thus we have for the curve of Type I. 

;?/==: 1890-83 (^1 + g:^^^^j \"'- '"^ 7^20511/ ' 

where the centroid is '98643 unit from axis oiy. 
The curve of Type III. is 

/ /r \ 3-673043 

^_i«y4 0/ ^1 +• 3.428094/ ^ 

The centroid is in this case •933313 unit on the positive side of the origin and the 
skewness = •462594. 

It will be noticed that the curve of Type I. extends '2706 unit or 1*353 years, 
and the curve of Type IIL '5676 unit or 2*838 years before birth. In both cases 
the chances of an "antenatar' death from enteric fever are very, very small. Curve 
of Type I. is in this respect better than the curve of Type III, The latter curve 
gives no maximum limit, the former a limit of about 77 units or 385 years. In both 
cases, however, the chances of a case of enteric fev6)r with the subject over 100 years 
are vanishingly small. These statistics of enteric fever thus set a maximum limit to 
the duration of life, but it is a limit so high as to have little suggestiveness. 

In order to see what is the nature of the difference made, when we suppose the 
liability to enteric fever to commence ivith hirth., I will treat these statistics as a 
case falling under § 16. 

If then iJb\, ft'g, and ft'^ be the first three moments about the vertical through 

vears we have 

«/ 

lj,\=z 379382, >'3= 18-46362, 

/i'g = 108-53175, 

. xa = 1-282813, Xs = 1*549399, 

u = -030435, v = -321856, 

mj — 2-14296, m^= 28-71414, 

b = 40-1206, 2/o " 1873-39, 

a, = 2-78629, a, ~ 37-33431. 



whence we have for the curve 



2-14296 / jj, \ 28-71414 

2-78629 / V' "~ 37-33431 j 






Here the duration of life is 200 years about, and the maximum incidence of the 
disease is at 13*93 years. 

Lastly for the normal curve, we have the constants a- = 2*01756 units = 10*0878 
years and ^/o == 1718*12. 



All the above four curves are drawn, Plate 12, figc 9. 
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We see at once that the normal curve is perfectly incapable of expressing statistical 
results like these. It gives an average error in the ordinate of 25*8 per cent, and no 
less than 260 antenatal deaths ! 

For the remaining three curves we have the following results : — - 



Curve of Type I. (closest fit) . . 
,, „ (starting at birtli) 

Curve of Type III. ...... 



Percentage error 
in ordinate. 



5-75 

7-3 

5-98 



Antenatal cases. 



3 


9 



The percentage errors here are well within those usually passed by statisticians. 
If they are slightly larger than what we have found in previous cases the source of 
the error is not far to seek. We have combined both male and female cases, but the 
distributions of enteric fever for both sexes are not the same. The fever curves for 
either sex differ in some cases markedly, although less for enteric fever than for 
diphtheria, for example. We have thus, in reality, a compound curve. I have found 
for about 700 male cases only a percentage error of about 5.^ 

Another point needing notice is the question of antenatal cases, which may at first 
strike the reader as absurd. The closest fitting curve of Type I. runs, as we have seen, 
1*35 years about before birth, and gives three antenatal cases. Three antenatal cases 
(or, indeed, 9 in the case of the curve of Type III.) is a very small percentage of 
8689 cases, and not of importance from the statistician's standpoint. But the fact 
that a curve starting before birth gives a better fit than one starting at birth, is 
significant, and there is every probability that a curve starting from about —- '75 year 
would give a still less percentage error than one from 1'35 year or from birth. t 

In dealing with mortality curves for infancy I have found it impossible to get good 
fitting theoretical curves, without carrying these curves backward to a limit of 
something less than a year. The " theoreticar^ statistics thus obtained of antenatal 
deaths seem to be fairly well in accordance with the actual statistics of maternity 
charities. In vital statistics therefore we must be prepared in most diseases for small 
percentages of antenatal cases and antenatal deaths, and it is just possible that theory 
in this matter will be able to indicate lines of profitable incjuiry to the medical 
statistician. 

(27.) Example F//.— As an example of the method of Section 15, I take the 
following statistics of guessing a tint. Nine mixtures of black and white were taken, 



* I propose on another occasion to deal with the age distribution of fever cases. My object at present 
is only to give typical illustrations of the method of calculating skew curves. 

t In fact the case of a pregnant woman with enteric fever is to be considered as a case also of 
antenatal enteric fever. 
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SO as to get a series of tints in arithmetical progression 1, 2, 3, 4, 5, 6, 7, 8, and 9. 
These tints were then placed in non-consecutive order, and 231 persons asked to guess 
a tint by affixed letters lying between 1 and 9. The results were as follows : — 



Tint. 


Frequency of 


Tint. 


JbVeqnency of 




guess. 


6 


guess. 


1 


! 

' 


54 


i 2 


8 i 


7 


94 


3 


7 


8 


40 


■ 4 


6 


9 








22 




f 
1 



Now, obviously, the number of tints and the number of persons guessing were far 
too limited to draw any definite conclusions as to the distribution of tint guesses. "^ 
I propose here merely to use these statistics to' illustrate the calculation of a skew 
frequency curve with a given limited range. I do not wish to propound any theory 
of tint guessing, nor to assert that these guesses actually distribute themselves 
according to the curves dealt with in this paper. 

Calculating the moments about the centroid in the usual manner, we have 



/^3 = 


2-1417 


H"6 = 


— 370067 


/*4 = 


19-6255 



> 



Centroid lies at a distance of 5*376624 units 

from the tint 1. 



We easily find S/^t^ {3/x/ — ^a^^) + Sjx^^ = 15*96335, or the observations fall into a 
curve of Type I., that is to say, have a limited range. 
We obtain 

^2=4-27862, 
c = 6-443186. 



13^ = 1-39407, 
r=: 6-95847, 



hence the ranoe 



h= 11-31768. 



Further 



m^=z 4-858705, 
a^=: 11-08997, 
c^= 1-561012, 



Skevvness 



mc^= -099765, 
: -22769, 
= 1-06666. 



a 



B 



Thus the range of the theoretical curve runs from a point 4*15233 units before 
tint 1, and concludes at a point -734674 unit before tint 9. The curve is, however, 

* I hope later to deal with the subject of tint guesses falling within a limited range, as my material 
increases in bulk. I would only note here, that the geometrical mean frequency curve does not seem to 
give results according well with experiments 

MDCCCXCV. — A. 3 E 
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practically insensible before tint 1. Considering the roughness of the experimental 
method, the obtaining an actual range of about 11 instead of 8, and its covering very 
nearly the range of 8 must be held to be fairly encouraging for the method. I shall 
accordingly calculate the constants of the curve on the assumption that the range lies 
between Tints 1 and 9, using the method of § 15. 
We find 

Ijl\ = 2-623376, /x'^ ^ 9'023803, 
yi = -327922, y^ = '429971. 
Whence 

m^ = 275412, m^ = '83172, 
a^ = 6-144435, a^ = 1*855565, 
and 

2/0=59*5996. 

Thus we may take for the curve 



X 



\ 375413 / ^ Y83i73 

y z= 59*6 [1 + ^:j^^^^j (^1 — i:355565J 

The curve is figured, Plate 11, fig. 10, with the first " smooth ^^ of the observations. 
It will be seen to give the general character of the distribution, but much more elaborate 
experiments would be required before any statement could be made as to whether 
frequency of tint guesses really does follow a curve with limited range of Type I. 
On the same plate the frequency of 128 guesses distributed over 18 tints is given, 
the approximation to a curve of Type I. is fairly close considering the paucity of 
guesses. 

(28.) Example VIIL— The question may be raised, how are we to discriminate be- 
tween a true curve of skew type and a compound curve, supposing we have no reason 
to suspect our statistics d priori of mixture. I have at present been unable to find any 
general condition among the moments, which would be impossible for a skew curve 
and possible for a compound, and so indicate compoundedness. I do not, however, 
despair of one being found. It is a fact, possibly of some significance, that the best 
fitting skew curve to several compound curves that I have tested is a curve of 
Type I., and not that of Type IV. which appears to be the more usual type in 
biological statistics. Taking, as an example, the statistics for the ^'foreheads " of Naples 
crabs due to Professor Weldon, and resolved into their components in my memoir, 
* Phil. Trans. ^ A, voL 185, p. 85, ei^ seq., I find for the best fitting skew curve the 
equation 

/ X \ 14-77264 / r.. \-i-0469 

where the origin is at 1*4274 horizontal units from the centroid-vertical in the 
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positive sense of the horizontal scale. If, now, we place this skew curve and the 
compound curve of Plate 1^ ^ Phil. Trans./ vol. 185, on top of the observations (see 
Plate 13, fig. 11), we see at once how much better is the fit of the compound curve. 
The skew curve gives a mean percentage error in the ordinates of 10*45 the compound 
curve of only 7*4. The determination of the best skew curve, when the compound 
curve is known, is easy, for all its details are already practically calculated. 

A criterion of whether a compound or skew curve is to be sought for ah initio, 
would be, however, of great value. 

(29.) Example IX. — A more markedly skew curve than any we have yet dealt 
with is that giving the frequency of divorce with duration of marriage. I take my 
statistics from a paper by Dr. W. F. Willcox, entitled '* The Divorce Problem, 
a Study in Statistics " (' Studies in History, Economics, and Public Law,' Columbia 



^^^9 VUl. X, JJ. ^UJ, 


XllCJ' die? ClO XUllUWC 


J . 




Duration of marriage 
in years. 


T\' /t r>or» /^\ 


Duration of marriao^e 


Divorces (1882-6). 


Divorces (1882-6). 


in years. 


1 


6314 


12 


4089 


2 


7483 


13 


3563 


3 


9426 


14 


3144 


4 


9671 


15 


2931 


5 


9014 


16 


2721 


6 


8274 


17 


2217 


7 


7021 


18 


1877 


8 


6093 


19 


1577 


9 


5305 


20 


1459 


10 


5002 


21 and over 


9401 


11 


4384 







Total number of divorces granted, 109,966. 



Now these statistics suffer from a defect common to many of the class — -the want 
of careful enumeration of the frequencies near the beginning and end of the series. 
It cannot be too often insisted upon that careful details of the frequencies in the 
start and finish of the distribution are requisite if we are to fit skew distributions 
with their appropriate skew curves. How, in this case for example, are we to 
distribute the 9401 divorces which occur after 21 years of married life? How, on 
the other hand, does the curve start ? It is impossible to place 5314 divorces at the 
mean— 6 months — of the one year duration. It is obvious that the applications for 
divorce will be far more numerous in the last half-year than the first half-year of 
matrimony. The very time required to institute legal proceedings and get a divorce 
granted must ensure this if nothing else did. Yet these two tails of 5314 and 9401^ 
of which the accurate distributions are not given, are between y and ^ of the total 
number of divorces, and until we know how they are exactly distributed^ we cannot 
hope for the very exact fitting of a theoretical curve, 

3 E 2 
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In order to make the best of the ^^ tails ^^ under the circumstances, their moments 
were calculated on two hypotheses, (i.) that they were triangles, (il.) that they were 
logarithmic curves, and the mean of these extreme results taken. 

I found 

^2 = 607376, fjL^= 809-15, 

7 ='150127, ^='36891. 

Distance of centroid from start of curve = 9*1183, 
,, maximum „ ,5 = 2*4373, 

yQ = maximum frequency = 8882'45, 

Here the curve is assumed, owing to the obviously long tail to the i*ight and the 
abrupt start to the left, to be of Type III, Its equation is accordingly 



^=8882-45 1+-: 



Q) 



•3fi891 



2-4573 



e *^^^^^'^jr, Skewness = '8547. 



The curve is figured, Plate 11, fig. 12, and will be seen to rise abruptly at about *47 
of a year's duration. It may be doubted whether legal proceedings even in America 
are so rapid that a divorce suit can be complete within six months of marriage. The 
curve gives fairly well the general form of the frequency statistics. Could the 
moments have been determined with greater accuracy, most probably a better fit 
would have resulted. As it is the mean percentage error is above 6. 

(30.) Example X, — A still more extreme case may be selected from the field of 
economics. I take the following numbers from the 1887 Presidential Address of 
Mr. GoscHEN to the Royal Statistical Society (^Journal,' vol. 50, Appendix II. 
pp. 610-2). I have grouped together both houses and shops, because the details of 
the two are not in Mr. Goschen's returns separated for values under £20, 

Valuation of House Property, England and Wales, years 1885 to 1886. 





Number of houses. 


i 


Number of bouses. 
47,326 


Under £10 


3,174,806 


£80 to £100 


£10 to £20 


1,450,781 


100 „ 150 


58,871 


20 „ 30 


441,595 


150 „ 300 


37,988 


30 „ 40 


259,756 


300 „ 500 


8,781 


40 „ 50 


] 50,968 


500 ,,1,000 


3,002 


50 „ 60 


90,432 


1,000 „ 1,500 


1,036 


60 „ 80 


104,128 







Here clearly the curve 6'^^ar^6^ with the maximum frequency, and further to any 
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scale to which the curve can be drawn, it tails away indefinitely to the right. This 
justifies us in the assumption that the curve will be fairly approximated to by a form 

of type 

y =. y^ xP ^-^^ 

where p would turn out to be a negative quantity lying between and 1. But the 
details given us of the start and finish of the curve are far too scanty to allow us to 
proceed by moments. In the first place, to measure an element of area of the 
frequency curve by an element of value into its mid-ordinate is perfectly legitimate 




at such a point as B ; it fails entirely, however, at such a point as A, which includes 
the part of the curve which is asymptotic to the ordinate of maximum frequency. 
The area at such a point is much greater than the element into the mid-ordinate, 
and the calculation of moments on the assumption that 3,174,806 houses may be 
concentrated at £5, is purely idle. The ordinate obtained from the area in this 
manner may often differ 30 per cent, from the true ordinate, and yet about three- 
fifths of the total number of houses fall into this first group. 

Further treating the area as ordinate into element of value is also true only if the 
element of value be small. For *' elements " such as £150, £200, or even £500, which 
are all that are given in the tail of these statistics, it is perfectly idle to concentrate 
the area at the mid-ordinate. The centroid of a piece of tail such as the accompanying- 
figure suggests lies far to the left of the mid-ordinate In other words, to attack the 




problem by the method of moments, we require to have the " tail '' as carefully 
recorded as the body of statistics. Unfortunately the practical collectors of statistics 
often neglect this first need of theoretical investigation, and proceed by a method of 
" lumping together " at the extremes of their statistical series. 

Still three further points in regard to the present series of statistics. First, they are 
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very unlikely to be homogeneous. Houses with an annual valuation of over £300 
hardly fall under the same series of causes as the bulk of houses in the kingdom 
which fall under £100. Secondly, when we are told that 3^174,806 houses are valued 
under £10, it can hardly mean that any houses are valued at 0, certainly not the 
maximum number. Hence our frequency curve in theory must not be expected to 
rise from zero, but from some point between and £10, which corresponds to the 
customary minimum at which a cottage can be rented. 

Lastly, there is one special cause at work tending to upset, about the value of £20, 
the general distribution due to a great variety of small causes. This is the value at 
which taxation commences, and we should expect a larger proportion of houses to be 
built just under the taxable value than is given by a chance distribution. 

Notwithstanding the many disadvantages of these results, I determined to obtain 
if possible a skew curve approximating to the main portion of the distribution. I took 
£10 as my unit of value and 1000 houses as my unit of frequency. I started with 
the ordinary method of moments, concentrating each area at its centroid as given by 
the total valuation of the group, also recorded by Mr. GoscHEisr, and found a curve of 
the type 

with 

^= — -65448, y=^ '2003. 

This was so far satisfactory that it showed even by this rough method that p was 
negative, and between and L Thus the theoretical curve gave an infinite ordinate, 
but finite area at its start. 

A laborious method of trial and error was then adopted, and by varying p and y 
slightly, as well as y^ and the origin of the curve, 1 sought to improve the fit given 
by the rough method (in this case) of moments. The fundamental consideration was 
to keep the total areas under £100 value as nearly as possible the same in the 
theoretical curve and the statistics. This portion of the curve I treated as prac- 
tically referring to homogeneous material. Ultimately I found the following curve : 

y = 1388-32 o^ -690077 ^-3057256.^ 

with the origin as '45 unit from zero. Thus the minimum annual valuation was 
£4 IO5., or, to a weekly valuation, of 15, 7^ch This would connote probably a weekly 
rental of Is, 8d to 2^. The total area of this theoretical curve was 5795 in thousands 
of houses ; of these 5729 had a valuation under £100 and 66 over ^£100 ; the corres- 
ponding numbers for the statistics themselves are 5720 and 110. The additional 44 
over £100 I assume to be due to the heterogeneity of the statistics— high values 
corresponding to blocks of chambers^ large hotels and other buildings hardly falling 
into the same category as the small house under £100 in value. Unfortunately the 
'^ tail" of the statistics is so defectively recorded that there is no hope of reaching a 
separate distribution for this high class property. 
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Returning now to the curve and statistics^ we have the following comparative 
results :-- 



Yalue. 


Number of lOOO's of houses. 


Theory. 


Statistics. 


£ 

Under 10 

10-20 
20-30 
30-40 
40-50 
50-60 
60-80 
80-100 

Above 100 


fill} 4625 

452 
253 
153 

97 
102 

46 

66 


nil } 4626 

442 
260 
151 

90 
104 

47 
110 



The general accordance here is very marked, the chief divergences being accounted 
for by the special causes to which we have referred above, i,e. (i) the crowding of 
houses just below the limit of taxation, and (ii) the divergent character of the causes 
at work determining the frequency of low and high class house property. 

The results are depicted, Plate 14, fig. 13. 

It will be observed that so far as the observations can be plotted to the theoretical 
curve, it leaves little to be desired. The histogram/^^ shows, however, the amount of 
deviation at the extremes of the curve. 

(31.) Example X7.— Frequency curves of the type considered in Example X. are 
so common that it is needful to make a few farther remarks with regard to them, 
and illustrate them by further examples. Such curves occur in many economical 
instances (income tax, house valuation, probate duty), in vital statistics (infantile 
mortality), and not uncommonly in botanical statistics of the frequency of variations 
in the petals or other characteristics of flowers. 

As we have noted, the method of moments developed in this memoir cannot be 
directly applied, or only applied to obtain a first approximation to the constants 
required. This first approximation, however, will often assist us to obtain with 
quite sufficient accuracy the value of the moments of portions of the area, especially 
if the position of the initial or asymptotic ordinate is known. 

For example, consider the curve of limited range : 



y = ^Q x'^^ {h — xy^ 

where p lies between and 1. Then if a be its area, ct/^ 
the asymptotic ordinate of the area w^ to x \ 



the 5*^ moment about 



* Introduced by the writer in his lectures on statistics as a term for a common form of graphical 
representation, i.e.^ by columns marking as areas the frequency corresponding to the i-ange of their base. 
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2/0 &' 



nx 



J 



+ 



n . n 



X 



3 



,1 + s— jp 6(2+5— j?) * 1.2(3 + s -j?) \ & 



&c. yx^-^'-^p. 



Hence, if the range h be large and x be small, this series converges very rapidly, 
and we may often take with sufficient approximation even only its first term. Thus 









// 
/^ 3 



a 



X 



= X' 



== a;' 



.3 



1 — p 

2 —2'^ 
, 1 — p 

1 — 79 



"^ 



^ 1 



1 -10 






A -p 



learly. 



Now a is given by the statistics, and we note that if p has been determined to a 
first approximation by the method of moments, we can now improve the values of the 
moments of the areas near the asymptotic ordinate by the use of the above 
expressions. 

For example, if ^ = '5 as a first approximation, we have 



/^ 1 — 3^^? /^ 2 ^ — y'^ 5 /^ 3 



.J- rY»0 



Concentration along the mid-ordinate in the usual manner would have given us 



r 1 — 2*^? r 2 — 4*^ ? r 3 — 8*^ > 



and as the area up to a short distance from the asymptotic ordinate is generally a 
considerable proportion of the total area, the above values very considerably modify 
the calculated moments. 
In the case of the curve 

we have the result 



tnx!\ = y^x'^^ ^ 



1 



ryx 



m uim h m ' vi 



ry^X^ 



1 + cS— _^ 2 + s—p ' 1.2.(3 + s—jp) 



Hence, as before, if y and x be small, 
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ju,"^ = T~- ^-- x% approximately.* 

Results such as the above enable us to aj)proximate fairly rapidly to the constants 
of a frequency curve. 

As a special example, I take the following. In 1887, Herr H: de Veies transferred 
several plants of Ranunculus bulbosus to his flower garden, and counted the petals 
of 222 of their flowers in the following year. He found (' Berichte der deutschen 
botanischen Gesellschaft/ Jahrg. 12, pp. 203-4, 1894) i 



JL et3ais . . . 


5 


6 


, 7 


8 


Frequency . 


. 133 


55 


23 


7 



(\ 



10 



2 '' 



^ 



Now the series here proceeds by discrete units, and corresponds probably to a hyper- 
geometrical series, but remembering how closely the results of tossing ten coins can 
be represented by a normal frequency curve, I was not v/ithout hope that the areas of 
a skew frequency curve would give results close to these numbers. The buttercups 
start with 5 petals and run to 10, I therefore took my origin at 4'5 and determined 
the constants to a second approximation in the manner above indicated. There 

resulted, 

y = •211225cc-'S^^ (7'3253 - xf^■^^ 

a curve of Type I., with limited range, the asymptotic ordinate being at 4*5 petals, 
or practically a distribution ranging from 5 to 11 petals. 
Calculating the areas, there results, 



5 


6 


7 


8 


9 


10 


11 


1369 


48-5 


22-6 


9-6 


3-4 


•8 


•2 


133 


55 


23 


7 




2 


. 



Tj, f Theory . . . 

J:Jrequency-< -^ 

L Observation . 

The agreement here is very satisfactory considering the comparative paucity of the 
observations. t The results are exhibited by curve and histogram, Plate 15, fig. 14 ; the 
two points on the '^ observation curve " corresponding to five and six petals are 
deduced from the areas given by the statistics by the same percentage reduction as 

^ Another very serviceable formula is due to Schlomilch. It gives the area of the '*tail" of 
y = ij^x" Ve-y'^ from a? = a? to ei? = go in a rapidly converging series, i.e.^ 

area = Ml^^l!! / 1 I + K - 'P ^'P' -^' ^) -^. &c. 

7 1 70? + 1 (7a? -h 1) {^ix -j" 2) (70? + 1) (7a? + 2) (7^3 4- 3) 

t 2048 tosses of 10 shillings at a time gave a mean 3 per cent, deviation between theory and 
experiment, 100 tosses gave about 9 per cent. The above series corresponds to about 7*2 per cent,, and 
thus is quite within the range of accuracy of coin-tossing experiments, 

MDCCCXCY. — A. 3 F 
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converts the theoretical areas into the ordlnates of the theoretical curve. For other 
petals, ordinates and areas practically coincide in value. 

(32.) Example XII. — Another example of a similar kind may be taken from 
Herr de Vribs' memoir {loc. cit, p. 202). He cultivated under the name of perum- 
hellatum a race of Trifolmm repens, in v^hich the axis is very frequently prolonged 
beyond the head of the flower, and bears one to ten blossoms. In the summer of 
1892 he had a bed of such clover, produce of a single plant, and in July counted the 
extent of this variation on 630 flowers. In 325 cases the axis, according to de Vries, 
had not grown through the head of the flower, in 83 cases it had grown through and 
bore one blossom, in 66 cases two blossoms, and so on. The complete statistics are 
as follows : — 



High blossoms 





1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


frequency . 


325 


83 


66 


51 


36 


36 


18 


7 


6 


1 


1 



Taking moments in the manner of the earlier part of this memoir, I found as a first 
approximation to the frequency curve : 

y = 4-5284:2 x"'^^^^^"^ (10-69114 - x)'^'^^^^^\ 

with the origin at '47813 to left of maximum ordinate. This first approximation 
seemed to justify three things : (i.) starting at °5 to the left of the maximum ordinate; 
(ii.) assuming a range, 11, which just covered the whole series of observations, ^.6., 
from '5 to 10*5 ; and (iii.) that the moments of the areas might be found from a value 
of p not far from '5. 

A second approximation was then made, and taking moments round the asymptotic 

ordinate, I found : 

/a'j= 1-8680, iaV= 777028, 

wlience, in the manner of §16, we have : 

Xi = -1698182, X2 = -3781526, 
and ultimately : 

m^ = — '493118, m^ = 1'47797, 

and 

y^ = 4'65148. 

The equation to the frequency curve is therefore : 

y = 4'65148i^~-^^^^i^ (11 - ^)^'^™7^ 

The value found for p, i.e., '493, justifies our calculation of the moments on the 
assumption that it was '5. 
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Placing statistics and theory side by side, we have : 



High -1 
blossoms J 







6 



8 9 



10 



Statistics 325 83 66 51 36 36 18 7 6 1 1 

Theory 303*22 106-12 69-99 49-27 35*23 24*93 17*07 10*96 6*27 2*79 *52 

The agreement between theory and observation is here all that could be desired, 
except in the case of and 1 high blossoms. Here 22 blossoms have in actual 
counting been transferred from the theoretical group of 1 to the theoretical group of 
zero high blossom. I consider it highly probable that the theory here gives better 
results than the actual statistics ; and this, for the simple reason that it must be 
very difficult to distinguish between any one of the low blossoms and a very slightly 
extended axis bearing only one blossom, that is to say, the extension of the axis 
passes insensibly into one of the low blossoms, or vice versa, and in a certain proportion 
of cases it must be difficult to distinguish between the categories and 1. The com- 
parison between theory and observation is represented by curve and histogram, 
Plate 15, fig. 15. 

Examples X. to XII. will suffice to illustrate tlie application of our theory to 
extreme cases of skew distribution. 

(33.) Example XIIL — It must not be supposed that in every case of variation by 
units (as in the buttercup and clover examples), the curve will be found to be of 
Types I. or III. It is impossible to illustrate, in anything short of a treatise 
on statistics, the infinite variety of statistical distributions, but the occurrence of 
Type IV. in zoological, as distinguished from botanical measurements, is so persistent 
that it seems well to illustrate this for the special case of discontinuous variation. 
Professor Weldon has kindly given me the following statistics of dorsal teeth on the 
rostrum of 915 S and ? specimens of Palcemonetes varians from Saltram Park, 
Plymouth. 



Teeth. 


Gases. 




1 


2 




2 


18 




3 


123 




4 


372 




6 


349 




6 


50 




7 


1 





The centroid-vertical here lies '313661 of a tooth beyond 4, ^.e., at 4*313661 teeth. 
The following are the moments about centroid-vertical : — 



3 F 2 
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/^3= -910906 

jji.^= "233908 V where the unit =::^ 1 tooth. 

ttA = 2-625896 

For the normal curve these give 

Standard deviation =^ '9544^ 
Maximum ordinate ^^ 382*5, 

For the skew curve we have 

/3^ z= -072222, ^2 ^ 3-164684. 

xjioiiC'e 

2^3— 3^1 - 6= -122702, 

or, we have a curve of Type IV. The values of ^^ and /3^, however, show that it will 
not differ very widely from the normal type. 

Proceeding to determine the other constants we find 

r^ 111-398, 

2/ rrz — 109*047 (p is negative since /xg is positive), 

a= 7-16613, m=: 56-699. 

Distance of origin from centroid-vertical =:= 7*0149, 

log 2/o = 18*4431056, 

I ll "J I Q 

x=z 7*16613 tanl9 

give the form of the curve. This curve, the normal curve, and the observations are 
drawn, Plate 13, fig. 16. A comparison of the observations and the normal curve shows 
an amount of skewness in the tails of the former, which would be very improbable if 
the normal curve really expresses the distribution. The skew curve really accounts 
for this divergence and is a sensibly better fit. The mean percentage errors in the 
ordinates are for the two cases 8*67 and 3*88. The skew curve is thus an excellent fit. 

The discontinuity in these teeth probably corresponds to a hypergeometrical polygon, 
of which the skew curve is a limiting form. 

(34.) Example X/F.— Another extremely interesting illustration of skew varia- 
tion will be found in the statistics of pauperism for England and Wales, to which my 
attention was drawTi by Mr. G. U. Yule, who had plotted the statistics from the raw 
material provided in Appendix I. of Mr. Charles Booth's ' Aged Poor .; Condition ' 

In Plate 14, fig. 17, we have 632 unions distributed over a range of pauperism varying 
from 100 to 850 per 10,000 of the population for the year 189L The observations 
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are at once seen to give a markedly skew distribution. Taking 50 paupers as unit of 

variation, we find 

ix^=z 6-31889, • ^8^ = 3-060017, 

/X3= 6-62465, ^1= -173942. 

IJL^= 122-1815, 
Hence 

3^1 - 2^^+ 6 = -401791, 

or the curve is of Type I. 

The other constants were found to be 



r~ 28-165013, 






e 148-0886, 






mi — 20-169714, 


a-y — 


24-2203 


mg 5-995305, 


- «3- 


7-199312 


2/o = 99-9065. 







Range = 31-4196. 

Maximum = -60434 to left of centroid vertical. 

Skewness == *24. 

The equation to the curve is thus 



2/ = 99-9065 l + ^-.3;^g- 1 



rQ v5'9953/ ^^, \ 20-1607 



24-2203 



For the normal curve, 

Standard deviation = 2*514, 

Maximum ordinate = 100*301. 

Both skew curve and normal curve are drawn on Plate 14, fig. 13. The former is at 
once seen to be an excellent fit. We might fairly have simplified our work by taking 
zero paupers as the commencement of our range, but preference was given to the more 
general results in order to demonstrate that they give no appreciable amount of 
•^^ negative pauperism." The range determines a limit of about 15 per cent, as the 
greatest possible amount of pauperism. The normal curve is seen to diverge very 
widely from the statistics besides giving an appreciable amount (3 to 4 unions) with 
"negative pauperism." The point-binomial for these statistics is also figured on the 
plate. Its constants are p = -833, q ^ -167, n = 14*4834, c = 1*70306, the start of 
the binomial being 5*81503 to the left of the centroid- vertical : see § 5. The fit is a 
very close one, the mean error of ordinate = 5*37, and the suggestiveness of such 
results for social problems needs no emphasising. 

The case is of peculiar interest, because the statistics of pauperism are known to 
give a definite trend to the distribution, ^.e., if the statistical curve of pauperism for 
1881 be compared with that of 1891, for example, the maximum frequency of the 
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earlier will be found at a much higher percentage^ The whole frequency curve is 
sliding across from right to left. Now it is of interest to notice that in this, as 
in other cases where the trend of the variation is known a priori^ the skew curve is 
shifted away from the normal curve in the direction in which variation is taking place 
with lapse of time. It is not safe at present to extend this to all biological instances, 
but the result suggests ^ for example, that there is a secular progression towards brachy- 
cephaly in Bavarian skulls (fig. 8), towards reduced antero-lateral margin in crabs 
(fig. 4), towards increased height in St. Louis school-girls (fig. 7), and towards long- 
sightedness in Marlborough School boys.^ I believe most suggestive and important 
results might be obtained for the theory of evolution, if we only had the series of 
skew curves for a biological case of progressive variation in the same manner as we 
have for pauper percentages. 

(35.) Example XV. The theoretical resolution of heterogeneous material into 
two components, each having skew variation, is not so hard a problem as might at 
first appear, and I propose to deal at length with the subject later. If there be more 
than two components, the equations become unmanageable. In this case however, if 
the components have rather divergent means, a tentative process will often lead to 
practically useful results. To illustrate this I propose to conclude this paper by an 
example of a mortality curve resolved into its chief components. By a mortality 
curve I understand one in which frequency of death (for 1,000, 10,000, or 100,000 
born in the same year) is plotted up to age. I have worked out the resolution for 
English males, and for French of both sexes. The generally close accordance of the 
results for both cases has given me confidence in their approximate accuracy. The 
method adopted was the following : An attempt was made to fit a generalised 
frequency curve to the old age portion of the whole mortality curve, the constants of 
this curve being determined from the data for four or five selected ages by the method 
of least squares ; the frequency curve so determined was subtracted from the total 
curvCj and a frequency curve fitted by the same method to the tail of the remainder. 
This second component was again subtracted and the process repeated, until the 
remainder left could itself be expressed by a single frequency curve. The com- 
ponents thus obtained were added together, and a tentative process adopted of 
slightly modifying their constants and position^ so that the total areas of the com- 
ponents and of the whole mortality curve coincided. It was soon obvious that no 
very great change either in the constants or position was permissible, if the sum 
of the components was to give the know^n resultant curve, hence I feel very confident 
that whatever be the combination of causes which result in the mortality curve, that 
curve is very approximately to be considered as the compound of five types of 
mortality centering about five different ages. The allied character of the results 
obtained for both French and English statistics confirms this view. 

* Dr. RoBiETs' statistics, wbicli I have reduced to skew curres, but have not reproduced in this 
memoir. 
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Professor Lexis has already suggested that the old age distribution of mortality 
is given by a normal curve. "^'^ Now^ although the rougher French statistics give 
a fair approximation to a normal curve, this is not true for Englisli males. The 
curve for old age is of Type I., but for all practical purposes it may be treated as one 
of Type III. Whatever be the chief causes of old age mortality, they extend very 
sensibly through middle life, and less sensibly through youth, only becoming inappre- 
ciable in childhood. Hence, if we speak of our first component as the '* mortality of 
old a.ge/' the name is to be understood as referring to a group of causes especially 
active in old age mortality, but not excluded from other portions of life. The 
second and third components I found to be skew curves, but so nearly normal that to 
my degree of approximation no stress could be laid on the skewness obtained. The 
fourth component was a markedly skew curve, also closely given by a curve of 
Type III., and corresponding in general shape to the mortality curves of fevers 
peculiarly dangerous in childhood (e.^., diphtheria, scarlet fever, enteric fever, &c.). 
These three components I have termed respectively the mortality of middle life, of 
youth, and of childhood. I found it impossible to fit the remainder of the original 
mortality curve with any type of generalised curve, so long as I supposed the 
mortality frequency to commence with birth. I was therefore compelled to suppose 
the set of causes giving rise to ^4nfantile mortality'' extended into the period of 
gestation, and I obtained a satisfactory fit for the infantile mortality frequency, when 
the range of the curve started about '75 of a year before birth. The form taken by 
the curve is the extreme type in which the curve is asymptotic to the ordinate of 
maximum frequency {of. Examples X.-XII.). The five fundamental components of 
the mortality curve for English males are the following, the numbers referring to 
1000 contemporaries, or persons born in same year :— 

(A.) Old Age Mortality. 

Total frequency = 484*1. 

Centroid- vertical at 67 years. 

Maximum mortality = 15*2 at 71 '5 years. 



The equation ist 



ry, \ 7-7525 

y= 15-2(1 -^) e''''- 



the axis of y being the maximum ordinate and the positive direction of a? towards 
age. The skewness of the curve = %345, and its range concludes at 106*5 years. 

The corresponding French component = 411, but the maximum mortality (16*4) 
occurs at 72*5 years. 

* ^ Zur Theorie der Massenersclieiniingen in der irienscHicben Gesellscliaff}/ § 46, Freiburg, 1877. 
t Unit of oj = 1 year, unit of 2/ = 1 death per year. 
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(B.) Mortality of Middle Life. 
Total frequency = 173*2. 
Centroid- vertical at 41*5 years. 
Maximum mortality = 5*4. 

The curve is very approximately normal, and has a standard deviation of 12*8 
years. The corresponding French component = 180 deaths, standard deviation 
12 years, with a maximum of 6 at 45 years. 

(C.) Mortality of Youth. 

Total frequency = 50*8. 
Centroid- vertical at 22*5 years. 
Maximum mortality = 2'6. 

The curve is very approximately normal, with a standard deviation of 7*8 j^ears.'^ 
The corresponding French component gives a total mortality of 78, standard deviation 
of 6 years, and a maximum of 5*2 at 22*5 years. 

The greater and more concentrated French mortality of youth is noteworthy. 

(D.) Mortality of Childhood. 
Total frequency = 46*4. 
Centroid- vertical at 6*06 years. 
Maximum mortality = 9 at 3 years. 

The equation to the curve, the axis of y being maximum ordinate, is 

y = 9 {I + xf^'' e-''''\ 

Thus the skewness of the curve = '87, and the range commences at 2 years. 

The French component appears to be shifted further towards youth. It gives a 
total of 47 deaths, centroid at 8*75 years, and a maximum of 5*8 at 5*75 years, 
skewness = *71. Childish mortality is therefore, if these results be correct, more 
concentrated, and at an earlier age in. England than in France. 

(E.) Infantile Mortality. 

Total frequency after birth =:: 245*7. 

Maximum frequency cfter birth occurs in first year and equals 156*2. 
The equation to the frequency curve is 

y= 236*8 {x+ -75)™'" 6-"'^^^ 

the origin being at birth, the skewness '707, and the centroid at *088 year,= 1 month 
neaily, before, birth. Taking the corresponding French component, we have a total 
frequency after birth of 284^ with 186 deaths in the first year of life. Infantile 
mortality is therefore considerably greater in France. 

1 — —^ j : 

§ 13 (v.). 



see our 
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If we investigate the areas of our infantile mortality curve, we have the following 
deaths : — 



1st year of life .... 
2nd year of life . . 


Theory. 


i 
1 

Statistics. 


156-2 
53-5 


158-5 
51-2 



After this the mortality of childhood begins to sensibly increase the infantile 
mortality. Turning to the '* antenatal '' portion of the curve, we have the following 
results, of course not verifiable from ordinary mortality statistics : — 

(i.) The total '' antenatal'' deaths for the 9 months preceding birth are 605 for 
every 1000 actually born and registered. 

(ii.) '* Antenatal'' deaths for the 6 months immediately preceding birth are 214 for 
every 1000 born, 

(iii.) '^Antenatal" deaths for the 3 months immediately preceding birth are 83 for 
every 1000 born at the proper period. 

The 391 "deaths " of the first three months of pregnancy would not be recorded, 
and in many cases possibly pass without notice. The 214 deaths of the remaining 
six months would be considered as miscarriages or still-births. The proportion of 
1 in 6 of such accidents to births of the normal kind does not appear excessive. On 
the average Dr. Galapin says such an occurrence is ^' the experience of every woman 
who has borne children and reached the limit of the child-bearing age." So far then 
there appears nothing to contradict our theoretical results in what is known of the 
first six months of antenatal life. 

For the last three months we have more definite data. According to our curve 
we have 83 deaths (per 1000 born) in the last three months before birth, or 83 in 
1083 pregnancies = about 7 '7 per cent. Now this percentage must consist of two 
factors — still-born children and children who, born before their time, die shortly 
after birth, and who would not be recorded in any proper proportions in statistics 
based on census returns, nor as a rule in the returns of maternity charities. 

For statistics of still-births, I find : 

per cent. 

Dublin Rotunda Hospital (1847-54) 6*9 

(1871-75) . 6-1 

Dr. J. H. Davis for 14,000 births for a large maternity charity 
in St. Pancras .4 

Guy's Hospital Lying-in Charity, 25,777 births, 1,127 born 
dead or died within a few hours, 1000 corresponding to 
births in the last three months of pregnancy . . . . .3*84 

Newsholme's ** Vital Statistics " (no authority cited) ... 4 

MDCCCXCV. — A. 3 G 
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It would thus appear that there are 4 to 5 per cent, of stiU-births, thus leaving 
2'7 to 3*7 per cent, of deaths to be accounted for— if there is any validity in our 
analysis— by deaths of children born before their proper time and dying before 
their proper birthdays. Such deaths would not appear in the category of still-born 
children in the returns of the maternity charities^ nor in any true proportion in the 
census returns. 

Thus^ while it is impossible to assert any validity for the antenatal part of our 
curve of infantile mortality, while, indeed, the constants of that curve, and con- 
sequently the percentages of antenatal deaths, might be considerably modified had 
we surer data of the actual deaths in the first year of life ; still there appears to be 
nothing wildly impossible in the results obtained, and they may at any rate be 
suggestive, if only as to the nature of those statistics of ^' antenatal " deaths, which 
it would be of the greatest interest to procure. 

The absolute necessity of skew curves in all questions of vital statistics is sufficiently 
evidenced in this resolution of the general mortality curve. A complete picture of the 
resolution into components of the mortahty curve is given (Plate 16, fig. 18), with 
a separate figure on an enlarged scale of infantile mortality. 

(36.) In conclusion, there are several points on which it seems worth while to insist. 
The normal ctirve of errors connotes three equally important principles : 

(i.) An indefinitely great number of ^^ contributory^' causes. 

(ii,) Each contributory cause is in itself equally likely to give rise to a deviation of 
the same mamitude in excess and defect. 

(iii.); The contributory causes are independent. 

The frequency of each possible number of heads in repeatedly throwing several 
hundred coins in a group together, practically fulfils all the above three conditions. 

Condition (ii.) is not, however, fulfilled if a number of dice be thrown or a number 
of teetotums of the same kind be spun together. Condition (iii.) is still fulfilled. 

Condition (iii.) is not fulfilled if p cards^be drawn out of a pack of nr cards containing 
r equal suits, supposing the p cards to be drawn at one time. Now, it appears to 
me that we cannot say a j^riori whether the example of tossing, of teetotum- 
spinning, or of card-drawing is more likely to fit the proceedings of nature. There 
is, I think, now sufficient evidence to show that the conditions (i.) to (iii.) are not 
fulfilled, or not exactly fulfilled, in many cases— in economic, in physical, in 
zoometric, and botanical statistics. We are, therefore, justified in seeing what results 
we shall obtain by supposing one or more of the above conditions which lead to the 
normal curve to be suspended. The analogy of teetotums and cards leads us to a 
system of skew frequency curves which in this paper have been shown to give a very 
close approximation to observed frequency in a wide number of cases— -an approxi- 
mation quite as close as the writer has himself obtained between theory and 
experiment in very wide experiments in tossing, card-drawing, ball-drawing, and 
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lotteries. But the introduction of these skew carves leads us to two important 
conclusions : — 

(i.) If a material be heterogeneous we have no right to suppose it must be made up 
of groups of homogeneous material each obeying the normal law of distribution. Each 
homogeneous group may follow its own skew distribution. 

(ii.) If material obeys a law of skew distributioUj the theory of correlation as 
developed by Galton and Dickson requires very considerable modification. 

We may note two points bearing on these two conclusions, which do not seem 
without interest for the general problem of evolution. Fever mortality curves are 
skew curves. The general mortality curve — frequency of death at diiferent ages- 
is a compound of many diseases, but with sufficient approximation, it can be resolved 
into five components ; three of these comxponents are markedly skew, the other two 
less so. Selection, according to age, is thus distributed with different degrees of 
skewness about five stages in life ; this at least suggests that selection according to 
the size or weight of an organ may be compound^ if we take a considerable range of 
size, and that the components may have varying degrees of skewness. 

The correlation of the ages of husband and wife at marriage is a subject with 
regard to which we have a very fair amount of material. For a given age of the 
husband, the frequency of marriage with the age of the wife fits very closely a curve 
of Type IV., and with sufficient exactness very often a curve of Type III.'^ The 
sections of the surface of frequency are oval curves difflsring entirely from the ellipses 
of the Galton-Dickson theory, but resembling in general the ^' oval '' polygons 
obtained by taking horizontal sections of the frequency polyhedron for the correlation 
of cards of the same suit in two players' hands at whist. Plate 9, fig. 19, shows how 
widely these differ from ellipses. There seems therefore to be considerable danger 
in assuming in vital statistics, whether in man or the lower animals, that the " con- 
tributory " causes are independent. All the statistics for sizes of organs in animals, 
which I have yet analysed, if they are not compound, seem to agree in following a curve 
of Type IV., and suggest this kind of inter-dependence of the " contributory" causes. 
Their correlation surfaces of frequency will thus have for lines of level skew ovals — 
what for want of a better name may be termed "whist ovals" as distinguished 
from the ellipses which flow from the normal frequency surface. The remarks from 
quite a different standpoint of Ranke on skull measurements seem to lead to the 
same conclusion. I propose on another occasion to illustrate the resolution of 
compound curves into skew components, and further to deal with the main features 
of correlation in cases of a skew frequency distribution. 



^ I have fitted some of Perozzo's marriage statistics wifch skew curves, but reserve their discussion 
for the present, as they belong properly to the theory of skew correlation. 
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412 MR. K. PEARSON ON THE MATHEMATICAL THEORY OF EVOLUTION. 



Note. 

Added May 24, 1895. 

[Since writing the above memoir my attention has been drawn to a note in 
Dr. Westergaard's '' Theorie der Statistik/' referring to Professor T. N. Thiele's 
treatment of skew frequency curves. I have procured and read his book, ' Forelaes- 
ninger over Almindelig lagttagelseslaere/ Kj^benhavn, 1889. It seems to me a. very 
valuable work, and is, I think, suggestive of several lines for new advance. It does 
not cover any of the essential parts of the present memoir. Dr. Thiele does indeed 
suggest the formation of certain ^'half-invariants," which are functions of the higher- 
moments of the observation- — quantities corresponding to the z^^— 3/x/, /^g — lO/x^/Xg, 
&c,, of the above memoir. He further states (pp. 21-2) that a study of these half- 
invariants for any series of observations would provide us with information as to the 
nature of the frequency distribution. They are not used, however, to discriminate 
between various types of generalised curves, nor to calculate the constants of such 
types. A method is given of expressing any frequency distribution by a series of 
differences of inverse factorials with arbitrary constants. Thus if 



A {^) 



\X 



n — X 



and 



AA. (x) = /3, [x + -|) -^ /3,, {x -- 4) 



we can express any law of frequency y =zf(^x) by 

f{x) — Bq/S,, {x) + h^ A/3,,„i (^) + . , . + k, ^'%{x), 

where the constants 6q, Z>i . . . h, can be determined numerically when the frequency 
of n + 1 chosen derivation-elements is known. 

I see a possibility of more than one theoretical development of interest, especially 
in relation to compound material, from this development of Dr. Thiele's, but I doubt 
whether it can be of practical statistical service even as an empirical expression for 
frequency. Instead of having the 3 to 5 constants of our generalised curves, the full 
value of Dr. Thiele's expression requires as many constants as there are recorded 
frequencies, and then expresses the result in functions like A**^ (x), by no means easily 
realised or likely to appeal to the practical statistician. It is true the complete series 
gives absolutely accurately the frequency of all the points used in the calculation, but 
it does not, like the generalised curves, indicate the purely accidental variations of 
the frequency. If, on the other hand, we take, as Dr. Thiele suggests, some half- 
dozen terms only of the series— -which give the really essential character of the 
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frequency — we obtain results which, although more complex in form, are not as satis- 
factory as those given by the genei'alised curve/ 

For example, Dr. Thiele gives the following series (p. 1 2) : — 



Valaes 


1 


8 


9 
35 


10 


11 


12 


13 


14 


15 

30 


16 


17 


]8 


19 


Erequencj .... 


3 


7 


101 


89 


94 


70 


46 


15 


4 


5 


1 



His ^^ Faktiske FejUove '^ gives 

y = -1221/31^ {x) + '27?> A^^^ {x) + '600 A'^^^q {x) 
+ -216 A^^g {x) + -278 A% {x) - -318 A^^^ {x) 
+ '574 A% {x) + -596 A^^g {^x) + -499 A^^,^. {x) 
+ -259 A^/Sg (x) - -0645 A^^ {x) - '0303 Aii/3i {x) 
- -0088 Ai% (a?). 

He tells us that 6 terms practically suffice, the additional terms merely accounting 
for the individual ii-regularities of this particular 500 observations. Without speci- 
fying what the observations are, he tells us that the possible values run from 4 to 28, 
or that the range is really limited. 

If we fit our generalised curve of Type I., we find for its equation : 



3-89708 



yz=: 98*801 1 + - 



4-5191 



1 — 



X 



17-27285 



20-0296 



the origin is at iri91, or the range runs from 6*6715 to 311202, Le., is a range of 
24 '5487 instead of 25, but is shifted some 2 to 3 units. Considering the small 
number of observations, this is not a bad approximation to a marked feature of the 
distribution not indicated on the surface by the observations, nor discoverable from 
the *' Faktiske Fejllove/' 

Comparing our curve (i.) with (ii.) the actual statistics— all 13 terms of the 
" Faktiske FejUove'' series, and with (iii.) the first 6 terms of the same series, we 
have the following results :— 



Yalues . 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


(i.) . 


1 


10 


42 


80 


99 


92 


70 


48 


29 


15 


6 


3 


1 


(ii.) . 


3 


7 


35 


101 


89 


94 


70 


46 


30 


15 


4 


5 


1 


(iii.) . 


1 


11 


40 


82 


103 


92 


70 


48 


26 


13 


8 


4 


1 



The generalised curve here gives slightly the better results in addition to its more 
easily realised form, and its fewer constants (iv.). 
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On the other hand, there ai^e, I think, some points of first-class theoretical impor- 
tance in the mode adopted by Dr. Thiele for expressing frequency ; it gives us a 
means of expanding all varieties of frequency curves in a series of factorial functions 
which may lead to important theorems in the analysis of heterogeneous material.] 



Plates. 

The scale of the accompanying figures is not that of the original drawings, and the 
clearness and distinctness of the several curves of the same figure have been, in 
several instances, partially lost by the process of reproduction and reduction. In 
every case the square element of the figure corresponds to the square centimetre of 
the original diagram, and is spoken of both in the text of the memoir and on the 
figures themselves as a square centimetre. The scale of actual reduction is indicated 
by a fraction placed at the lower right-hand corner of the figure. 
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DISTRIBUTION OF i09J966 DIVORCES ACCORDING TO DURAT/ON OF MAR RIAGE . UNITED STATES, mz-86 (WEWilcox) 
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